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Introduction 

This book is intented to be a technical support for students in finance. It 
is the reason why it is entitled "Probability for finance". Our purpose is to 
provide the essentials tools of probability theory useful to understand finan- 
cial models. Consequently, almost all the examples illustrating probability 
results are taken from the fields of economics and finance. It means that we 
assume readers have elementary knowledge in finance and microeconomics, 
but also in elementary linear algebra and analysis. 

Since the seminal work of Markowitz (1952) on portfolio diversification, 
mathematical models of financial markets have tremendously developed. The 
Capital Asset Pricing Model of Sharpe (1964), Lintner (1965) and Mossin 
(1966) was established in the sixties and continuous-time finance also started 
at the end of the same decade (Merton, 1969, 1971). Option pricing models, 
following the Black-Scholes- Merton model (Black and Scholes, 1973, Merton, 
1973)) have given rise to a systematic mathematical approach of the pricing 
of derivative contracts. Sophisticated financial products have been created; 
they have generated a demand for valuation models. These models are es- 
sentially based on mathematics, and more precisely, on probability theory 
and stochastic processes. 

Today, any finance student has to deal with a lot of mathematical con- 
cepts, some of them being very sophisticated and going beyond what is taught 
in undergraduate programs in economics and management. This book tries 
to fill the gap between what students actually know, and what they should 
know to enter the universe of financial models. One of our objectives is to 
present these tools in a pedagogical way, but it does not mean that the read- 
ing will be easy. Hard work is required to manage the tools in a performing 
way. 

The book is divided in four chapters. Chapter 1 is devoted to probabil- 
ity spaces and random variables. Its purpose is to explain how to describe 
the uncertainty on financial markets and to specify how prices and returns 
can be written in a mathematical consistent way. Prices and returns can 
also be summarized by some numbers measuring their average value, the dis- 
persion of possible future values or the relationship between the returns of 
different stocks. These quantities are called the moments of the probability 
distribution of prices or returns. Their presentation is developed in chapter 
2. 

Beyond the synthetic presentation of random variables through moments, 
a more detailed approach is necessary to specify how possible future values 
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are disseminated along the real line. It means that it is useful to characterize 
probability distributions of economic variables like returns, interest rates 
or exchange rates. It is done in chapter 3 where we present the essential 
distributions appearing in the financial literature. 

Finally, economic agents acquire (costly or freely) information over time. 
New information changes beliefs about the likelihood of future events or, in 
other words, changes the perceived probabilities of possible future events. 
The probability distributions of relevant economic variables are then mod- 
ified. It is the reason why a part of chapter 4 is devoted to conditional 
distributions and conditional expectations. 

The second part of this last chapter introduces limit theorems and con- 
vergence, in order to make a smooth transition between one-period models 
and multi-period models 1 . In fact, there are essentially two categories of 
financial models. They can be distinguished by the way time is measured. 
In discrete-time models, markets are open on a (finite or countable) set of 
dates when in continuous-time models, markets are always open. It is then 
important to check if a continuous-time model is the limit (in a sense to be 
defined) of a sequence of discrete-time models in which the duration between 
two transaction dates shrink to zero. 



1 These multiperiod models and the corresponding mathematical tools are described in 
Roger (2010). 
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Chapter 1 

Probability spaces and random 
variables 

1.1 Measurable spaces and probability mea- 
sures 

To start with a simple approach, we assume that economic agents live in a 
one-period economy with a starting date t = 0 and a end-date T = 1. Some 
financial securities (assets) are traded on the market at date 0 and generate 
payoffs at date 1. The description of these payoffs and the valuation of 
the corresponding securities at date 0 are the essential building blocks of a 
financial model. 

Depending on the number of assets and on the complexity you desire for 
the model, you will authorize a number of possible terminal situations for the 
payoffs and, more generally, for the whole economy. This set of possibilities 
is called the set of states of nature and denoted Q 1 . Q may be finite or 
infinite depending on how you want to describe the market. The subsets of 
ft are made of states of nature which describe information about the possible 
situation at date T. For example, if there is only one risky asset traded on 
the market, a range of terminal prices for this asset is associated to a subset 
of states of nature. For technical reasons we don't detail here, not all the 
subsets of Q can be considered in a model when Q is "too large" 2 . If V(Q) 

In mathematical books devoted to probability theory, f2 is often called the sample 
space of a random experiment. In our context the random experiment in which we pick 
some situations is the economy or the financial market. 

2 When f2 is not countable, a probability measure cannot be defined on V(Q) in a 
consistent way. It will be justified in sectionl.1.3. 
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denotes the set of subsets of fi, we restrict the model to subsets satisfying 
some reasonable properties allowing to define a probability measure. It is 
the reason why we need the (maybe) abstract concept of a-algebra 3 . 

1.1.1 rj-algebra (or tribe) on a set Q 

Definition 1 Let Q denote a set of states of nature and V(Q) the set of 
subsets of Q; a a -algebra (also called a tribe) on Q is a subset A of V{VL) 
satisfying: 

1) ft e A 

2) V B G A, B c G A where B c is the complement of B defined by B c = 
{uj G Q I 'uj B} . A is then closed under complementation) 

3) For any sequence (B n , n G N) of elements of A, Un=i B n G A. In other 
words, A is closed under countable unions. 

The pair (ft, A) is called a measurable space and the elements of A are 
called events. An event containing only one state of nature is an elemen- 
tary event. 

At date T = 1, only one elementary event (a state of nature) uj occurs. 
An event A is said to be true if uj G A and A is false if uj ^ A. 

Even when ft is finite, it is possible to define several tribes on Q. For 
example if Q contains 4 states, that is Q = {ui,U2, 003, U4} , we could choose 
A = {0, Q} which is the most simple (and the smallest) tribe on Q or A' = 
{0, {to>i, 0J2} , {^3, U4} , Q} or A = V(Q), etc.. 

From definition 1, we easily get the following proposition. 
Proposition 2 Let A be a tribe on Q; 

1 ) for any sequence (B n , n G N) of elements in A, the intersection n^5 n G 
A (A is closed under countable intersections). 

2) 0 eA. 



3 The symbol a is often used in finance to denote the standard deviation of the return on 
a financial security (see chapter 2). Here, it has nothing to do with this usual interpretation 
but cr-algebra is the usual notation in probability theory for the notion defined below. 
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In a financial model, a tribe describes all the possible information con- 
veyed about the state of nature which will eventually occur at the terminal 
date. To illustrate the point, we assume in a first step that Q is finite. 

Definition 3 T = {Bi, Bk} is a partition of Vt if: 

1) Bid Bj = 0 when i ^ j 



The information held at date 0 by an economic agent may be represented 
by a subset A included in Q. It means the agent knows that the true state 
of nature is in A. However, this information does not completely remove 
uncertainty, because A may contain several states of nature. 

Note that the set of all possible unions of elements of T, including Q and 
0, is a tribe, called the tribe generated by T (the proof is left as an exercise). 
In fact, if a given set Bj of the partition is true (meaning that the state of 
nature which will occur is in Bj), all the unions of elements of F containing 
Bj are also true and all the unions of elements of T which do not contain Bj 
are known to be false. Obviously, 0 is always false and Q is always true. 
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Definition 4 Let Y = {B\, Bk} be a finite partition of Q; the tribe gen- 
erated by T, denoted as Br, is the smallest tribe containing all the elements 
ofT. 

The following proposition summarizes the properties of Br- 

Proposition 5 a) The elements of Br are 0, fl, and all the unions of ele- 
ments ofT. 

b) Every tribe on a finite set fl is generated by a partition. 

c) Br has 2 K elements 4 . 

1.1.2 Sub-tribes of A 

In multi-period financial models (that is, when T > 1) with a finite number 
of states of nature, the natural tribe to be chosen at the terminal date T is 
V(Q) since it contains all the elementary events. However, at date t < T, 
some uncertainty remains and it is relevant to consider sub-tribes of V{VL). 

Definition 6 A subset A! ofV(Q) is a sub-tribe of A if A' is a subset of 
A containing Q and satisfying points (1) to (3) of definition 1, where A is 
replaced by A'. 

In other words, (Q,A') is itself a measurable space since A' satisfies all 
the properties of a tribe. For example, when Q = {u±, u 2 , oj 3 , U4} , the tribe 
A' = {0, {ui,U2} , {u 3 ,uj4} , Q} is a sub-tribe of V(Q). 

It is easy to check that the three properties of definition 1 are satisfied. 
First, Q G A'] second, for any event B in A!, B c is in A! since {c^i,^} = 
{co> 3 ,u;4} c . Finally, any union of elements of A! is an element of A! because 
{u u u 2 } U {w 3 ,w 4 } = tt. 

When Q is a finite set, we saw in proposition 5 that any tribe is generated 
by a partition. Therefore, we can establish a link between two tribes A and 
A' such that A' C A and the partitions T and P* generating these tribes. 



4 Generally speaking, even if Card(Q) is infinite (Card(Q) denotes the number of ele- 
ments of fi), Card{Q) < Card(V(Ct)). This result is due to Georg Cantor (1845-1918); it 
explains why in a preceding comment, we mentioned that if fi is not countable, V(fl) is 
"too large". 
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Definition 7 A partition T is finer than a partition V if every element of 
V is a union of elements ofF. The partition T is then called a refinement 5 
ofV. 

Proposition 8 Let A! be a sub-tribe of A; the partition Y generating A is a 
refinement of the partition V generating A!. 

This proposition is an obvious consequence of point (c) in proposition 5. 
As the number of elements in a tribe is always 2 K for some positive integer 
K, K is the number of elements of the partition generating the corresponding 
tribe. It follows that A' is a sub-tribe of A; it contains less elements and it 
is also the case for the partition by which it is generated. 



Example 9 One of the most popular models to describe the time- evolution 
of stock prices is the so-called binomial model (Cox-Ross-Rubinstein, 1979). 
The price at a given date is obtained by multiplying the preceding price by u 
(d), meaning a price increase (decrease). 

Let Q = {uu; ud; du; dd} denote the set of possible trajectories of a stock 
price in the two-period binomial model. A' = {0; {uu; ud} ; {du; dd} ; Q} is 
a tribe on Q and a sub-tribe ofV(Q). In fact, {du; dd} = {uu; ud} c and Q is 
the complement of the empty set; moreover, {uu; ud} |J {du; dd} = Q G A. 
The price process of the stock is described on the figure below, where it is 
assumed that the initial price is equal to 1. 



uu = u 



u 



ud 
du 



dd = d 2 



1.1) 



We observe that the subset {uu; ud} corresponds to the two price paths 
starting by an up-move. In the financial model, it simply means that after 
one period during which an up-move has been observed, investors know that 
the final state will be an element of {uu; ud} . The same remark is valid 
with the other subset when the stock price decreases at the end of the first 
period. Obviously, considered as numbers, the products ud and du are equal. 
However, when ud denotes a state of nature ( corresponding to an up-move 

5 Symetrically, V is less fine than T. 
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followed by a down-move) , it is not the same as du. We will come back on 
this point in the next section. 



Example 10 If Q = R, the smallest tribe containing all open intervals is 
called the Borel tribe on Q and denoted £>r. It is the commonly used tribe when 
one deals with K or an interval of K. As it is a tribe, also contains all 
countable unions of open intervals, all closed intervals... and more generally 
all the subsets of the real line we need in a financial model. 
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Even if the concept of tribe may seem abstract, the reader will understand 
in the next subsection why it is required to define correctly a probability 
measure. 

1.1.3 Probability measures 

During 2009, many economists were asked to provide predictions about the 
recovery of the economy in the months to come. They had comments like 
"a strong recovery is unlikely" or "we can expect a slow recovery in 2010". 
Likelihood of an event is usually measured by a number between 0 and 1. 
In the formalism of probability theory, the mapping linking events to such 
numbers is named a probability measure and is defined as follows. 

Definition 11 Let (Q,A) be a measurable space; a probability measure on 
A is a mapping from A to [0; 1] satisfying: 



c)The triple (Q,A,P) is called a probability space. The event Q is 
called the sure event and 0 the impossible event. 

A probability measure being defined on events, it is necessary that a 
countable union of events is in the tribe for (b) not to be meaningless. Sim- 
ilarly, if we consider an event B and its complement B c , point (b) implies 



from which we deduce P(B C ) = 1 — P(B). The probability of the complement 
of a given event B is naturally defined, as soon as B c is in the tribe. These 
remarks explain why defining tribes or a-algebras was necessary. 

The following proposition summarizes the properties induced by definition 



a) P(fi) = 1 



b) For any sequence (B n ,n G N) of disjoint® events in A: 




P(B) + P(B C ) = P(O) = 1 



11. 



6 Two events are disjoint if their intersection is empty. 
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Proposition 12 Let (Q, A, P) denote a probability space: 

1) P(0) = 0 

2) V (P 1; B 2 )eAx A, B 1 C P 2 P{B 1 ) < P{B 2 ) 

3) Let (B n ,n G N) 6e an increasing sequence (B n C B n+1 ) of elements in 



A: 



hm oo P(B n ) = p(\jB n ) 
VneN / 



^) Let (B n ,n eN) 6e a decreasing sequence (B n D B n+ i) of elements in 

A: 



} un oo P(B n ) = p(f]B n ) 

VneN / 



5)V BeA, P{B C ) = 1 - P(P) 

Proof. 1) and 0 are disjoint, therefore P(Q(J 0 ) = P(O) + P(0) = 
P(O) = 1. It implies P(0) = 0 

2) Pi C B 2 => P(P 2 ) = P(P 1 U( J B 2 n^i c )) = P(Bi) + P(B 2 f]B c 1 ) > 
P{Bx) 

3) As (B n ,n G N) is increasing, the sequence u n = P (\J^ =1 B^j is in- 
creasing and has an upper bound (lower or equal to P(O) = 1), it then has 
a limit. But (P n , n G N) being increasing for inclusion, the limit is nothing 
else than P (U„ eN A») • 

4) As (B n , n G N) is decreasing, the sequence v n = P (f]p =1 Bp) i s de- 

creasing and has a lower bound (greater or equal to P(0) = 0), it then has 
a limit. But (P n , n G N) being decreasing for inclusion, the limit is nothing 
else than P (f] nen B n ) . 

5) Point (b) of definition 11 implies P (P \J B c ) = P(P) + P(P C ) since P 
and P c are disjoint; but as P \J B c = fi, we deduce P (P U P c ) = P(fi) = 1 
leads to P(P C ) = 1 - P(P) . ■ 
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Example 13 Let Cardial) = N and A = V(Q) ; the uniform probability 
measure on A is the one which gives to every elementary event the same 
weight, that is 7 : 

This probability measure appears in simple experiments like coin tosses or 
simulation problems. 

Example 14 Let now Q be the unit square [0; 1] x [0; 1]; it is an uncountable 
subset ofM. 2 ; it has to be equipped with the Borel a -algebra, that is the tribe 
containing all open sets. In this framework the uniform probability distribu- 
tion is characterized in the following way. If A is an event included in the 
unit square Q, P{A) is equal to the area of A. P obviously satisfies P(VL) = 1; 
P is usually called the Lebesgue probability measure on the unit square. The 
area of the union of two disjoint subsets of [0; 1] x [0; 1] is obviously equal to 
the sum of the areas of these subsets. 

It is important to mention that any finite or countable set of points in the 
square has a Lebesgue measure equal to 0. Generally speaking, the probability 
of any rectangle B = [a; b] x [c; d] is equal to (d — c)(b — a) < 1. This remark 
points out the intuition about the equivalence between the Lebesgue measure 
on the square and the uniform probability measure on a finite set. Imagine 
that a dart is thrown at random on the square 8 , the probability that it falls in 
B is equal to (d — c)(b — a). 



1.2 Conditional probability and Bayes theo- 
rem 

In the following, the probability space we refer to is (O, A, P) even if it 
is not explicitly recalled. When investors get some information, it means 
that they learn something about the state which will eventually occur. For 
example, they may learn that an event B C Q is true. Consequently, they 
change the initial probability measure defined on A to take into account 

7 P(uj) is a commonly used simplified notation. In fact, to be rigorous, we should note 
P({ui}) because a probability measure is defined on events, not on states. 

8 In a comparison which has become famous, Burton Malkiel (A Random Walk down 
Wall Street, 1973) wrote: "A blindfolded monkey throwing darts at a newspaper's financial 
pages could select a portfolio that would do just as well as one carefully selected by the 
experts." 
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this new information. To formalize this process, we introduce conditional 
probabilities. 



1.2.1 Independent events and independent tribes 

Definition 15 1) Two events Bi, B 2 in A are independent if P (B\ f] B 2 ) = 
P(B X ) x P(B 2 ). ' 

2) Let B 2 G A such that P(B 2 ) ^ 0; the conditional probability of B\ 
knowing B 2 , denoted as P(B\ \B 2 ), is defined by: 

p( Bl \B 2 ) = p ( B ^) 



P(B-i 



As said before, conditional probability has a natural interpretation. If 
you learn that the event B 2 occurs, you also know that the true state of 
nature is in B 2 . Consequently, your evaluation of the probability that B\ 
occurs is changed; the uncertainty is reduced to B 2 , not to the whole set 
Q. In particular, if B\ f] B 2 = 0, you can be sure that Bi will not occur. 
Therefore, the conditional probability of Bi will be 0. 

Analogously, if the two events Bi and B 2 are independent, the occurrence 
of B 2 brings no information about the occurrence of B±. In this case, the 
conditional probability of Bi knowing B 2 should be equal to the unconditional 
probability... and it is obviously the case since: 

p(Bl lB2) ~ p(b 2 ) ~ — pm — - p{Bi) 



Example 16 To get an easily understandable illustration of independence, 
consider one more time Q = [0; 1] x [0; 1] equipped with its Borel tribe and the 
Lebesgue probability measure. Denote (x, y) a point in VL; let B\ = [0; |] x 
hr; l] and B 2 = [0; |] x [0; |] ; we then have: 

1 

3 
1 

6 



Pi*) = 5 x- = 
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If B 2 is known to occur and if (x, y) G B\, we know that x G [0; 3] and y 

has a probability 1/3 to be in the range [|; |] . In fact, (x,y) G B 2 means that 
y < 2- For (x, y) to belong to B 1 it is also necessary that y > |, consequently, 
knowing that y G [0; |] implies a 1/3 probability that y G [|; |] .We t/ien 
deduce that P(B 1 \B 2 ) = |. -Bi -^2 = [0; |] x [|; |] , tue iuhie: 

P (Bin s 2) = (i_ 0 )x(i-i) = l 

it Zeads to: 

P(B 1 |B a ) = f = i = P(B 1 ) 

6 

i?i zs i/ien independent of B%. 
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Lemma 17 Two disjoint events with non zero probability and different from 
Q are not independent 9 . 

This result is obvious because if B\ and B 2 are disjoint with non zero 
probability, their intersection is empty and has zero probability. Therefore, 
the conditional probability is 0, different from the one of B±, and the two 
events cannot be independent. Knowing that B 2 is true implies that B\ is 
surely false. 

Independence of events is then generalized to the independence of a- 
algebras in the following way. 

Definition 18 Two sub-tribes Q and Q' of A are independent if : 



Two tribes are independent if any pair of events in Q x Q' is independent. 
We advice readers to look for two independent sub-tribes of V(VL) on a set 
Q with 4 equally likely states. Lemma 17 may be useful to build such an 
example. 



1.2.2 Conditional probability measures 

We mentioned before that, when an investor gets a piece of information, he 
changes his beliefs. In other words, he defines a new probability measure on 
(Q, .4.). No information means that the only event you know to be true is Q 
and the only event you know to be false is 0. Consequently, getting a piece 
of information means that you know some proper subsets of Q are true while 
some non empty others are false. Conditional probability measures formalize 
this process. 

Proposition 19 Let B e A and define Ab by: 



Ab is a tribe on B. In other words, (B,Ab) is a measurable space. 

9 We mention this obvious result because in many occasions we noticed that some 
students had a tendency to mix the two notions, independence and void intersection. 



V5 e Q^B 



"' e Q', P(B n B') = P(B) x P(B') 
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Proof. B G Ab because Q f] B = B ; let (C n , n G N) be a sequence of sets 
written C n = A n f] B; we can write: 

u^=u(A.n B ) = (iM,)f> 

neN neN \neN / 

As A is a tribe, UneN A n £ A and then (UneN At) D ^ e ^s- 

Denote now C = Af]B E Ab and C^j the complement of C in B. We 
have: 

C C B = 



^f>) c f> = ( AC f>)U(^ c f> 

A c f|SG^l B 



Proposition 20 Let B G f2 smc/i t/iot -P(-B) 7^ 0; the mapping denoted 
P{. \B), which associates P(B\ \B) to any event Bi, is a probability mea- 
sure on (B, As) ■ 

Proof. First, remark that P(B \B) = 1. Let (C n ,n 6 N) be a sequence of 
disjoint sets in Ab', we get: 

p (ucuifl) = p ((u^gnB) = ^OWgnB)) (L3) 

For every n, C n C 5, then the last term in the right-hand side of equation 
1.3 is written as: 

(UneN _ Sn€N ^ (^") _ YsneN ? (Pn QjO _ \ ^ p i R \ 



Thinking to the time-dimension of financial models provides a natural in- 
terpretation of conditional probability measures. Assume that an information 
is disclosed at date t which reveals the occurrence of an event B. Economic 
agents take this information as granted and reallocate probabilities to events 
conditioned on this information arrival. 

For example, statistics about the U.S economy (production, unemploy- 
ment, etc.) are often disclosed on fridays at 8:30 AM, before the opening of 
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U.S markets. Due to the time-lag, this information comes on European mar- 
kets at 1:30 PM or 2:30 PM, that is when the corresponding financial markets 
are open. Investors change their beliefs according to these new pieces of infor- 
mation and it may have important consequences on market prices, especially 
when the disclosed information is perceived as a surprise. In other words, 
after the disclosure agents work on the probability space (B,Ab, P{- \B)). 



Destination MMU 
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1.2.3 Bayes theorem 

To introduce Bayes theorem, consider the following example. On a large 
population of individuals, 1 over 10 000 suffers from a rare disease which 
can be diagnosed by a simple test. In 1% of cases, the test result is wrong, 
providing a positive result when the individual is healthy or a negative result 
when he is ill. What is the probability of being ill if you receive a positive 
test result? 

If you think about the question too quickly (as many people do), you 
could answer that you have 99% chance of being ill. This answer is wrong; in 
fact the probability is only 1%. The question asks for the probability of being 
ill, conditional on a positive test, but most people provide the probability 
of getting a positive test, knowing that they are ill. To understand what is 
going on, consider a set of 10 000 people, one of them being ill (the mean 
proportion in the population). The test result being wrong 1% of the time, 
100 people will get a positive result but only one person suffers from the 
disease. Consequently, a positive test result means that you have 1% chance 
of being ill. 

Bayes theorem is the tool allowing to deal correctly with this kind of 
question. 

Proposition 21 Let (B\, B 2 , B n ) be a partition offl and C G A, all being 
non zero probability events; we then get: 



P(B,-\C) 



P(C\B 3 )P(B 3 ) 



Proof. The subsets Bj define a partition, therefore: 



n 




i=l 



and we deduce: 



n 



n 




(1.4) 



i=l 



i=l 



Moreover 



P (Cf|^) = P(C \B 3 )P{B 3 ) = P(B 3 \C)P{C) 



(1.5) 
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Replacing in (1.4) the value of P(C) coming from equation (1.5) leads to the 
desired result. ■ 



Equation (1.4) shows that the probability of an event may be written 
as a weighted average of conditional probabilities, the loadings being the 
probabilities of the events in the partition. 

Coming back to the above example, denote C the set of people getting a 
positive test, B\ the set of ill people and B 2 = B\ the set of healthy people. 

p( Bl \c)- w>w>) 



P(C|B 1 )/>(B 1 ) + P(C\B 2 )P(B 2 ) 



We know that 



P(Bi) = 1(T 4 
P(C\Bi) = 0.99 
P(C\B 2 ) = 0.01 



Bayes theorem leads to: 



P( Bl |0) = ° 4 " X 1<r \ tt = 0.01 

v 1 ; 0.99 x 10- 4 + 0.01 x (1-10- 4 ) 



1.3 Random variables and probability distri- 
butions 

1.3.1 Random variables and generated tribes 

Intuitively, a random variable (a future stock price or a return) is a quantity 
not known in advance; in other words, its value depends on the elementary 
event which will eventually occur at the terminal date T = 1. It is then 
natural to describe this mathematical object as a mapping from Q onto a set 
of numbers depending on the phenomenon we want to model. If, for example, 
the variable is a price, the relevant set of possible values is the set of positive 
real numbers 10 denoted M + . When the variable is a return (possibly taking 
negative values) the entire real line R is the set of possible values 11 . 

10 A stock price cannot be negative because of the limited liability of shareholders. 
11 If a linear return is calculated as (S± — So) /So, the minimum possible value is -1. 
However, if a logarithmic return is used, defined by ln(Si/So), the minimum value is -co. 
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However, one of the objectives of such a financial model is to assess prob- 
abilities to subsets of values taken by economic variables. For example, we 
could want to define the probability that the next day return on the S&P500 
index will be in the range [—2%; 2%]. In the preceding section, we showed 
that a probability measure must be defined on a set of events (a tribe). 
The following definition translates these ideas in a consistent mathematical 
language . 

Definition 22 Let (O, A) and (E, B) two measurable spaces; a random 
variable is a function defined on Q taking values in E (X : Q — > E) which 
satisfies: 

VE G B, X~\B) e A 

where the set X~ 1 (B) is defined by X~ 1 (B) = {u G / X(w) G B} . X is 
also called a A-measurable function. 

Suppose that X is a stock return; in this case E = M. The definition 
means that the reciprocal image of an interval of possible stock returns is 
in the tribe A. Based on this assumption, it is possible (starting from a 
probability measure on .4.) to define a probability measure Px on S K by 
Px(B) = P (X^ 1 (B)) . This induced probability measure will be defined 
more formally in the next section. 

The notion of random variable allows to answer the abovementioned ques- 
tions. For example, if X denotes the tomorrow closing value of the S&P500 
index and if B is a range of possible index values, Px(B) is the probability 
that the index ends tomorrow in this range. The space E in which the ran- 
dom variables take their values is usually R or W 1 or one of their subsets, 
like R + or the set N of positive integers. If E C R, we deal with real random 
variables, and if E = W 1 we refer to random vectors. 

When some information is obtained about the values of a random variable, 
we deduce that some events in A occur. For example, if X denotes a stock 
return and if we know that X is in B = [—2%; +2%] , we infer that the event 
X^{B) occurs. More generally, observing the value of a random variable 
defines a list of events in A known to be true or false. This intuition is 
formally described in the following definition. 

Definition 23 Let X denote a random variable defined on (O, .4.) and taking 
values in (E, B) . The tribe generated by X ( denoted Bx ) is the subset of A 
defined by: 

B x = {AeA I 3B gH ,A = X-\B)} 
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We let the reader check that Bx is a sub-tribe of A, that is a subset of 
A satisfying the properties of definition 1. 

Calling a random variable a ^4-measurable function puts to light another 
important point; a function X may be a random variable when Q is equipped 
with a given tribe and may not be a random variable with respect to another 
tribe. This point is fundamental when modelling the evolution of financial 
or economic variables over time. In fact, the information known at a given 
date t defines a sub-tribe (usually denoted Tt) of the tribe generated by 
information known at date s > t. It means that agents do not forget what 
they knew in the past, or in technical notation, Tt C T s . 

For example, when we introduced the binomial model in the preceding 
section, we wrote states of nature as uu or ud meaning that, at date 2, 
investors remember that the stock price process started by an up-move. Fi- 
nally, it is worth noting that if Q is finite and A = V(Q), any function from 
Q to R is a random variable. The reason is that any subset of Q and, a 
fortiori, the reciprocal image of any interval, is in A. 
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Let us now illustrate these points with a simple example. Let Card(Q) = 
4 and X,Y two random variables defined in table 1.1. The main difference 
between the two variables is related to the information they convey when 
we observe their values. All possible values of X are different; consequently, 
the state of nature is identified as soon as a value X(u) is observed. On the 
contrary, suppose you observe Y(u) = 2. You cannot infer if the true state 
is U2 or CJ4. In more technical words, the tribe generated by Y is included in 
the one generated by X. 



State 


X 


Y 




1 


4 


U 2 


6 


2 


Co>3 


2 


1 


U4 


3 


2 



Table 1.1: Definition of X and Y 
More precisely, we can write : 
B x = V(Q) 

B Y = {0,fi, {uJi} , {^ 2 ,w 4 } , {u 3 } , {u 1 ,uj 3 } , {uj 1 ,uj 2 ,uJ4} , {^3,^2,^4}} 

We observe that By does not separate states 2 and 4, simply because Y 
takes the same values on the two states 12 . 

Example 24 Consider one more time the binomial evolution of a stock 
price; let S t the date-t price of the stock: 

S — I u St-i with probability p 

\ dSt-i with probability 1 — p 

A two-period (three dates) model is represented by Q = {uu; ud; du; dd} 
corresponding to the possible paths of the stock price. At date 0, agents know 
nothing about future prices, so the tribe generated by the initial price Sq is 
B 0 = {0, Q} . The only two events agents know to be true or false are 0 and 
Q. 

At date 1, the price Si is observed and everybody knows which one of the 
two events {du;dd} or {uu;ud} occurs. The first (second) event means that 
a down(up) -state has been observed. The tribe B\ is then defined by : 

B\ = {0, Q, {du; dd} , {uu; ud}} 

12 This kind of remark led Ross (1976) to show that index options are more "efficient" 
than options on individual stocks to complete a financial market. 
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It is a refinement of Bq since Bq C B\. It is nevertheless worth to note that, 

even if B\ is a list of events relevant at date 1, it is built at date 0. 

Finally, at date 2, an interesting phenomenon appears if the final price 
is S 2 = udS 0 . It is not possible to know exactly the trajectory if Si has been 
"forgotten ". In fact, an up-state followed by a down-state leads to the same 
final price as the reverse sequence (down-state followed by an up-state). Con- 
sequently, the relevant tribe B 2 is the one generated by the pair of variables 
(Si, S 2 ) and not by S 2 only. We let the reader determine Bs 2 and show that 
this tribe is strictly included in the one generated by Si and S 2 . 

1.3.2 Independent random variables 

Recall that two events A and B are independent if P(A D B) = P(A) x P(B) 
(definition 15). 

Definition 25 Two random variables X and Y defined on (Q,A,P) and 
taking values in (E, B) are independent if for any pair (A, B) £ B 2 , the 
events X^ 1 (A) andY~ l (B) are independent. 

Example 26 Let Q = {^1,^2,^3,^4}, A = V(Q) and assume that the four 
states are equally likely 13 . Let X and Y be two random variables defined by: 



The question of independence can be first examined in an intuitive man- 
ner by asking if knowing the value taken by one of the two variables brings 
information on the value taken by the other. When Y = 1, X is equal to 1 or 
-1 with equal probabilities (states u)\ anduj^). But when Y = 2, we get exactly 
the same possible results for X (states uj 2 and 0J3). Consequently, knowing 
Y doesn't change the probabilities of the events in Bx- The same arguments 
could be applied by exchanging the roles of the two variables. 

The independence of X and Y can be easily proved by using definition 
25. The important point is that the values of X and Y are not fundamental. 
What is crucial is the information revealed by the variables. In other words, 
if X and Y are independent, it is also the case for aX and bY where a and 

13 To be precise we should say "elementary events" instead of "states". 



1 

-1 
1 

-1 



1 

2 
2 
1 
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b are non-zero real numbers. The following proposition links independence of 
variables and independence of the tribes generated by these variables. 

Proposition 27 Two random variables X and Y are independent if and 
only if is the generated tribes Bx and By are independent. 

1.3.3 Probability distributions and cumulative distri- 
butions 

A real random variable X is, as defined before, a function defined on a mea- 
surable space (CI, A) taking values in (R, Br). Any random variable X then 
allows to define a probability measure on Br starting from the probability 
measure P on A. 

Definition 28 Let X denote a random variable defined on (Q, A) with range 
in (E,B) ; 

a) The probability distribution of X is a probability measure Px on B, 
defined by: 

MB e B, Px(B) = P (X~ 1 (B)) =P({u(ECl / X(u) e B}) 

b) When (E,B) = (R, Br), the cumulative distribution function of X , 
denoted Fx, is a function from R to [0; 1] defined by: 

F x (x) = P ({u e O / X(u) < x}) = P x ((-oo;x\) 
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X(u) 


0 


1 


2 


3 


Proba 


(3) 35 

(10) - 120 


3x( 2 7 ) 63 


3x (D 21 


1 

120 


(10) " 120 


(io) -120 



Table 1.2: Probability measure induced by X 



This definition shows why it is interesting to define probabilities of events 
linked to economic variables. The probability measure defined on A is an 
abstraction because Q is a general space describing all possible economic 
situations. In real-life problems, probabilities of the P^-type, and the corre- 
sponding cumulative distribution functions, are often used in place of P. 



Example 29 Consider a simplified lotto game with 10 numbers, players 
chosing 3 numbers among the 10. Let X denote the random variable count- 
ing the number of correct numbers on a given ticket, the official draw being 
given. The relevant set Q is the set of triples of different numbers between 1 
and 10. 

Denote (^) = kl ^L k ^ the number of combinations of k numbers taken 
from a set of n numbers. The set f2 has then (g 0 ) = 317^3^ = 120 elements. 
As Q is finite, we can choose A = V(Q) and P(u) = j^. 

The probability measure Px is built as follows. First remark that X can 
take only 4 values from 0 to 3. Table 1.2 gives the probabilities of the events 14 
{X = k} fork = 0,...,3. 

There is only one combination with three correct numbers, so P(X = 3) = 
j^q. Concerning {X = 2} , there are 3 possible pairs of winning numbers and 

we then have to draw one number in the 7 losing numbers. Consequently 
P(X = 2) = Using the same argument for {X = 1} , there are three 
possible choices for 1 winning number among the 3 and Q =21 couples 
of losing numbers; we then deduce P{X = 1) = Finally, P{X = 0) = 
Q /(g 0 ) = 35/120. We can check immediately: 

63 + 35 + 21 + 1 = 120 (1.6) 



Remember that {X = k} is a shortcut for {w <E f2 such that X(oj) = k}. 
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This simple example first shows how to build the space (Q, A, P) and, 
second, how to use binomial coefficients to define Px- Moreover, Ex is a 
proper sub-tribe of A, meaning that Ex 5 A. Even if a player is only in- 
terested in the number of winning numbers on his ticket, telling him this 
number does not completely reveal the state of nature (the official draw), 
except if he notched the three correct numbers. 

Proposition 30 The cumulative distribution function ( CDF) Fx of a 
random variable X is an increasing, right- continuous function, satisfying: 

lim Fx(x) = 0 and lim Fx(x) = 1 

x— >— oo x— >+oo 

Proof. Fx is increasing due to point (2) of proposition 12 (P>i C B 2 
P(Bi) < P(B 2 )). Ifx<y, we have (— oo; x] C (— oo; y] and then Px ((— oo; x]) < 
Px{(-oo;y}). 

Right-continuity comes directly from the definition of Fx (x) as the prob- 
ability of the interval (— oo;x] , closed on the right. In fact, let (x n ,n e N) 
denote a decreasing sequence converging to x. The sequence B n = (— oo;x n ] 
is decreasing and converges to B = (— oo; x] . Point (4) of proposition 12 
allows to conclude. 

The results about the two limits can be proved with the same approach, 
that is by using sequences going to — oo or +oo when n tends to infinity. ■ 

Example 31 In finance, the CDF is used to define a popular risk measure, 
namely Value at Risk or VaR 15 . Banks must hold sufficient capital to 
face potential portfolio losses. To value the amount of required capital, the 
VaR(99%) is commonly use. It is defined as the number x such that: 

P(X>x) = l- F x (x) = 0.99 

where X denotes the return of the portfolio over a given period of time. 

When X follows a continuous distribution, we get: 

VaR(99%) = Fx^O.Ol) 

In other words, the potential loss which may be borne with a probability 
of 99% is lower than \x\ . 

15 See Jorion (2000) for developments on VaR. 
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Example 32 Stochastic dominance is a concept which allows to rank 
financial assets 16 . It is defined as follows: a financial asset which pays X 
dominates an other financial asset paying Y, for stochastic dominance of 
degree 1 if: 

Vx e R, F x {x) < F Y (x) 

where F x and F Y are the CDF of X and Y. 

It can be proved that when X dominates Y , all agents characterized by a 
strictly increasing utility function prefer X to Y, independently of their risk 
attitude. This result is intuitive because the above inequality is equivalent to: 

P({X > x}) >P({Y> x}) 

Consequently, for any value x, the probability of getting a payoff greater 
than x is greater for asset X than for asset Y. It explains the expression 
"stochastic dominance ". 



See Huang-Litzenberger (1988), chapter 2. 
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1.3.4 Discrete and continuous random variables 

Definition 33 a) A random variable X is discrete if there exists a sequence 
(x n ,n G N) such that: 

^P({wGfi / X(u)=x n }) = J2 p ( x = 0 = 1 

n£N raGN 

{x n ,n G N) is called the support of X. 

b) A random variable Y is continuous if there exists a positive function 
fy, continuous (except at most on a finite or countable number of points) 
such that: 

Fy(x) = f fy(y)dy 
J —00 

where Fy is the CDF ofY. fy is called the probability density ofY (or, in 
short, the density). 

Remark 34 1) Proposition 30 shows that: 

/+00 
fv{y)dy = 1 
-00 

2) When a variable X is discrete, as in example 29, the CDF is a step 
function which exhibits jumps at the values in the support of X. A discrete 
variable can then be expressed as a combination of indicator functions defined 
as follows. 



Definition 35 Let B G A ; the indicator function of the event B, denoted 
1b is defined by: 

l B (cu) = 1 ifcueB 
= 0 otherwise 

The name of these variables is natural because their value is 1 on a state 
co to indicate that u is an element of B. 

Example 36 Some financial assets can be modelized by means of indicator 
functions. For example, the Chicago Board Options Exchange trades binary 
options on the S&P500 index 17 . These contracts pay $100 to their holders 
when the index is above a given value (the strike price) at the maturity date. 
Suppose a strike price K = 1000 and denote X T the value of the S&P500 
at the maturity date T. The payoff of the binary contract is equal to 100 x 
l{x T >iooo} where {X T > 1000} = {u G O / X T {u) > 1000} . 

17 see http://www.cboe.com/products/indexopts/bsz_spec.aspx 
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More generally, indicator functions allow to present discrete variables in 
an alternative way, that is as a linear combination of indicator functions. 

Proposition 37 Let X denote a discrete variable with finite support {x\, x n } , 
with Xi 7^ Xj for i ^ j ; there exists a partition T = {Bi, B n } of Q such 
that: 

n 

X = ^2xil Bi 

i=l 

This result is obvious when defining 5, = {w 6 O / A(w) = x i\ = 
1,2,. ..,n. 

If Cardial) = N and X(ui) = Xj, we get: 

N 
i=l 

In financial models with Card(Q) < +oo, the financial assets whose pay- 
offs may be described by 1^ , are called Arrow-Debreu securities. When 
all these securities are traded, equation (1.7) shows that every financial secu- 
rity is a portfolio of Arrow-Debreu securities. The market is said "complete" 
in this case. 



1.3.5 Transformations of random variables 

The question addressed in this section is the following: knowing the probabil- 
ity distribution of a given random variable X, can we deduce the probability 
distribution of the random variable Y = g(X) where g is a sufficiently regu- 
lar function? There are many economic and financial examples showing that 
this question is relevant. 

• A derivative security is a contract whose payoff is a function g of the 
price of an underlying asset, like a stock or an index. For example, 
a call option on a stock, with exercise price K and maturity T, is a 
contract which pays Y? = g{Xx) = max(Xy — K\ 0) where Xj- is the 
date-T price of the stock. 
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• An other more simple transformation lies in the link between prices 
and logarithmic returns. Assume to simplify that a stock price is equal 
to 1 at date 0 and equal to X t at date-t. The logarithmic return on the 
interval [0; t] is defined by: 



The question is then to determine the probability distribution of the 
return, starting from a given distribution of the price. 

• In microeconomic models, the future random wealth of an agent is 
transformed by a utility function to measure satisfaction. Moreover, 
to compare individual risk aversions, it is necessary to study concave 
transformations of utility functions. 

The following proposition formally establishes the link between fx and 
fy. We will illustrate this result in chapter 3 by determinig the density of a 
price starting from the density of a return and vice versa. 

Proposition 38 Let X denote a variable with density fx and g a strictly 
monotone, continuously differ entiable function from M to M. The density fy 
ofY = g(X) is defined by: 






0 otherwise 



where Y(Q) = {y el / y 



Y(u) for u en}. 
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Chapter 2 

Moments of a random variable 



The standard theory of portfolio choice, developed in the fifties by Harry 
Markowitz (1952) says that investors realize a tradeoff between return and 
risk when they build a portfolio. As return on a future period of time is a 
random variable, investors use a measure of return called "expected return", 
which is a weighted average of possible future returns. They measure risk 
by a simple function of the deviations of possible returns with respect to the 
expected return. It is called the variance of returns. 

Mathematically speaking, investors realize a tradeoff between expectation 
and variance. The notions of expectation and variance of a random variable 
are then fundamental in most financial models. In this chapter, we first dis- 
tinguish discrete and continuous variables to define expectation and variance, 
before presenting the genaral definition. We then move on to skewness and 
kurtosis which allow to analyze moresophisticated properties of stock returns 
like assymetry or "peakedness" of the distribution of returns. 

We introduce covariances and correlations, which measure the intensity 
of the linear relationship between random variables. These tools are also im- 
portant in portfolio management because of the principle of diversification. 
Diversifying risk allows investors to get a better expected return without 
taking more risk. It is then intuitive that including several stocks in a port- 
folio achieves better diversification when the returns on these stocks are not 
strongly linked, that is when the covariance between them is low. 



2.1 Mathematical expectation 

The mathematical expectation of a random variable is the technical transla- 
tion of the intuitive concept of average of a sequence of numbers. 
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Consider one more time example 29 of the lotto ticket (chapter 1) and 
assume that the organizer of the game decides to attribute an equal share of 
prizes to the three categories of winners. Assume that 120 players have each 
bought one ticket (which costs $1) and played different combinations. The 
amount of prizes to be shared is $120. Each category of winners has to share 
$40. The unique winner with three correct numbers receives $40. The 21 
winners with two correct numbers receive $|? each and the 63 winners with 
only one correct number obtain $|| each. Before the official draw, a player 
who wants to value his average gain weighs the amounts he can win by the 
corresponding probability of winning. Before the official draw, the player can 
expect the following average gain: 

$40x^ + $« x ^L + s«x^ = $l 
120 21 120 63 120 

Obviously, in this overly simplified example, we find that the expected gain 
is equal to the initial price of the ticket because we assume that all players' 
stakes are redistributed (and they chose different combinations). In this case, 
we can say the game is fair. Real lottos are not that fair because the organizer 
keeps around half of the stakes. 
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2.1.1 Expectations of discrete and continuous random 
variables 

Definition 39 1) Let X denote a discrete random variable with support 
{xi, ...,x n } , Xi G R for any i. Let Pi = P(X = Xj) for i = l,...,n; the 
expectation of X under probability P (also called "first-order moment") is 
the quantity (if it exists), denoted E(X), and defined by: 



2) Let X be a continuous variable with density fx and CDF F x ; the 
expectation of X under probability P is the quantity (if it exists), denoted 
E(X), defined by: 



If n and the X{ £1X6 finite in point (1) of definition 39, the expectation 
exists. 

Moreover, if X = 1 B then E(X) = E(1 B ) = P(B). 
Therefore, if X=Y™ =1 Xil B . with Bi = {X = x^} , we get : 



To be completely rigorous, we should note Ep instead of E since the 
expectation depends on the probability measure P. However, E is sufficient 
when there is no ambiguity about the probability measure under which ex- 
pectation is defined. 

Nevertheless, it is worth to notice that arbitrage pricing models are based 
on a probability change. So, financial models often distinguish the "real" 
probability measure, denoted P, and the "risk- neutral" probability measure, 
denoted Q. It explains why sometimes it is necessary to denote expectations 
as Ep or Eq (section 2.4 gives more details about probability changes). 

Discrete and continuous variables are important cases but a general de- 
finition is needed because many random variables are neither discrete nor 
continuous. For example, if X is discrete and Y continuous, X + Y is neither 
continuous nor discrete. 



n 



E(X) = Y J ^ 



i=l 
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2.1.2 Expectation: the general case 

The expectation of a general random variable is built through a convergence 
process starting with variables with finite support (as in definition 39). It 
is then generalized to positive random variables and, finally, to general vari- 
ables. 

Definition 40 Let V denote the set of random variables with finite support^ 
defined on a probability space (Q,,A, P) and X a positive (general) random 
variable. The expectation of X , denoted E{X) or f n XdP, is the quantity 
(if it exists) defined by 2 : 



We are able to calculate E{Y) for any Y G V because Y is discrete 
(definition 39). Therefore, the above definition means that the expectation 
of a positive random variable X may be calculated as the upper bound of the 
expectations of all discrete variables lower than X. The definition may also 
be interpreted by saying that a positive random variable can be written as 
the limit of a sequence of discrete random variables. In fact, the idea behing 
this definition is the same as the one used to define the Riemann integral of 
a function as the limit of the integrals of a sequence of step functions. 

To understand the notation J n XdP, remember how we defined E(X) for 
a continuous variable. The expectation was the integral of x with respect to 
the CDF Fx- In the general case, the expectation is the integral of X with 
respect to P. It explains the notation as an integral on Q. 

To address the general case, that is when the sign of X is unknown, it is 
enough to remark that a random variable X may be decomposed as follows: 



X = X + - x~ 

with X + = max(X; 0) et X~ = max(— X; 0). The definition of E(X) is then 
deduced from the preceding definition. 



Variables in V are also called simple variables. 

2 Assume that / is a function; the upper bound sup x6 ^ f(x) is the lowest number greater 
than all the numbers f(x) for x £ A. 
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Definition 41 Let X denote a real random variable; the expectation of X, 

denoted E(X), is the quantity (if it exists) defined by: 

E(X) = E(X + ) - E(X~) 

This definition is consistent with definition 40 since X + and X~ are 
positive. 

In each of the preceding definitions we mentioned that E{X) may not 
exist; when it exists, X is said integrable with respect to P, or simply inte- 
grable when there is no confusion about P. Finally, remark that E(X) exists 
as soon as E {\X\) exists, simply because |X| = X + + X~ . 

The essential properties of expectations are summarized in the following 
proposition. 

Proposition 42 Let X, Z denote two integrable random variables and A, B 
two events in A ; 

1) X = 1 A =>E{X)=P{A) 

2) 0<X<Z^0< E(X) < E(Z) 

3) {X > 0 and Ad B} => E (X1 A ) < E (X1 B ) 

4) Vc e R,E(cX) = cE(X) 

5) E(X + Z) = E(X) + E(Z) 

6) \E(X)\ < E (\X\) 

Proof. 1) X = 1a is equal to 1 with probability P(A) and equal to 0 with 
probability 1 — P(A). A direct application of definition 39 gives the result 
E(X) = P(A). 

2) In definition 40 choose Y = 0 as the variable in V. It follows that 
E(X) > E(Y) = 0. 

In the same way, Z > X, implies that: 

sup {E(Y), Y <X}< sup {E(Y), Y < Z} (2.1) 
Yev Yev 

As a consequence, E(Z) > E(X). 

3) We know that A C B and X > 0, consequently XI a < X1 B and the 
result is a direct consequence of (2). 
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4) If X G V, the result is obvious and it is also the case if X is positive and 
c > 0. More generally, decompose X and cX in X + — X~ and (cX) + — (cX)~ . 
It is worth to note that if c is negative, (cX)~ = — cX + and (cX) + = —cX~, 
therefore: 

E(cX) = E ((cX)+) - E {(cX)-) 
= -cE (X-) + cE (X + ) 
= -c(-E(X)) =cE(X) 



5) The proof is the same as in point (4) when considering, first, positive 
variables, and then decomposing X in X + — X~ . 

6) \X\ = X++X- implies E (\X\) = E(X + )+E(X-) > \E(X+) - E(X~)\; 
in fact, for two positive numbers x and y, we know that x + y > x — y and 
x + y > y — x. ■ 



Remark 43 When a sample (xi, X2, x n ) of a random variable X is ob- 

n 
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2.1.3 Illustration: Jensen's inequality and Saint-Petersburg 
paradox 

The theory of decision making under uncertainty is based on a number of 
assumptions about agents' preferences. In the usual framework of microeco- 
nomic textbooks, agents maximize the expectation of a utility function by 
choosing amounts of consumption and investment. In a one-period model, 
agents consume at dates 0 and 1, date-1 consumption being financed by the 
payoffs of investments chosen at date 0. It follows that date-1 consumption 
is a random variable X and the agent maximizes E \u{X)] under a budget 
constraint, where u stands for his utility function. 

One of the usual assumptions is that agents are risk averse. It means 
that an agent being offered a lottery paying 0 or 100 with equal probabilities 
would prefer, instead of playing the lottery, to get 50 = \ (0 + 100) for sure. 
In other words, such an agent prefers to get the expectation E(X) for sure, 
instead of getting the random consumption X. 

Risk aversion can then be characterized by: 



where X and u(X) are assumed integrable. 

The Jensen's inequality in the following proposition allows to characterize 
the utility functions of risk averse agents. 

Proposition 44 Let X denote an integrable random variable and u a con- 
cave 3 function from 1 to K such that u(X) is integrable; we then have: 



To illustrate this important result, consider a random variable taking two 
values x\ and x 2 with probabilities p and 1 — p. Jensen's inequality writes: 



The curve representing a concave function u(x) is then always above the 
line joining (xi,u(xi)) and (x2,u(x2))- 

3 A function / is concave if for any (x,y) and any A € [0; 1] , f(Xx + (1 — X)y) > 



E [u(X)) < u [E(X)} 



E [u{X)\ < u [E{X)\ 




pu(xi) + (1 - p)u(x 2 ) < u(pxi + (1 — p)x 2 ) 



Xf(x) + (1 - X)f(y) 
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Remark 45 a) If u is convex, inequality 2.2 is reversed. 

b) If u is strictly concave, the inequality is strict as soon as X is not 
always equal to its expectation (X is not a constant). 

Strict concavity of utility functions has two different meanings. The first 
one is that marginal utility of consumption is decreasing because u' > 0 and 
u" < 0. The more an agent consumes, the less a supplementary consumption 
unit brings satisfaction. This is one of the first assumptions we can find 
in undergraduate economics textbooks. But this decreasing marginal utility 
property has nothing to do with randomness and probability. 

The second interpretation, appearing in Jensen's inequality, is risk aver- 
sion which is obviously linked to randomness and probability. It is then 
important to understand that the same mathematical property (concavity 
of the utility function) has two completely different interpretations. It is 
sometimes considered as a weakness of the expected utility theory. 

Risk aversion and concavity of utility functions can be illustrated by the 
Saint- Petersburg paradox 4 . 

Example 46 The St-Petersburg paradox 

In a fair coin tossing, a player wins 2 n monetary units if Tails appears 
for the first time on the n-th toss and the game stops. When Heads occurs, 
the coin is flipped one more time. 

Let N denote the random variable counting the number of tosses before 
the game stops. Each coin toss being fair, the probability of getting Heads 
(Tails) is 1/2. Consequently, P{N = n) = ^ since we need a sequence of 
n — 1 Heads followed by a Tails (successive tosses are independent) . But the 
gain is then equal to 2 n . Therefore, the random gain of the player, denoted 
X , has an expected value defined by: 

+oo +oo 1 

E(X) = J2'2 n xP(N = n) = J2^ n x ¥ = +oo 

n=l n=l 

If economic agents were to maximize their expected wealth, they would be 
ready to pay any price to play this game since the expected gain is infinite. 

4 The initial contribution of Nicholas Bernoulli (1695-1726) concerning this paradox was 
published in 1738 and then published in English in 1954 in Econometrica. 
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All experiments show that people are much more risk averse. They accept to 
bet only a small amount of money to participate. 

Bernoulli proposed an alternative to wealth maximization as the maxi- 
mization of the expectation of a concave transformation of wealth, namely 
the logarithm of wealth. We get in this case: 



+00 -. +00 

E(ln(X)) = ^ln(2«)x-=ln(2)^^ 

and we know that: 



2 r 

n=l n=l 



-too -t-00 ^(XJ 



n=l n=l k= 

It follows that E(\ia(X)) = 21n(2) = ln(4); the player is then indifferent 

between playing the game and getting 4 monetary units for sure. In other 
words, with a logarithmic utility function, he is ready to pay 4 units to play, 
not more, even if the expected value of the game is infinite. This famous 
example shows that other elements than the expected final wealth play a role 
in decision making. 
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2.2 Variance and higher moments 

As shown by the Saint Petersburg paradox, agents take into account, not 
only the expected final wealth, but also the risk associated with X. Many 
risk measures have been proposed in the financial literature but the most 
popular, especially in portfolio choice theory, is the variance of returns. This 
popularity is partly due to the statistical properties it carries and to the 
simplicity of the portfolio choice models it allows to build. The variance of 
returns is also a fundamental variable in option pricing models. 



2.2.1 Second-order moments 

Definition 47 The second- order moment of a random variable X, de- 
noted as ^ 2 (^0 is the quantity (if it exists) defined by: 

/i 2 (X) = E(X 2 ) 

When /i 2 (X) exists, X is said square-integrable. 



Definition 48 Let X be a square-integrable random variable; the variance 

ofX, denoted V(X) or a 2 {X) is the quantity 5 : 

V(X) = a 2 (X) = E [(X — E(X)) 2 ] 

V(X) is also called the central second-order moment. In fact, if Y = 
X — E(X), we get V(X) = n 2 {Y). Y is a zero-mean random variable, that 
is, EiY) = 0. 

Proposition 49 Let X be a square-integrable random variable; 

V(X) = E[(X- E(X)) 2 ] = E(X 2 ) - E(X) 2 (2.3) 

Proof. 

E[(X-E(X)) 2 ] = E [X 2 - 2XE(X) + E(X) 2 ] (2.4) 
= E (X 2 ) — 2E [XE(X)\ + E(X) 2 (2.5) 
= E(X 2 ) - 2E(X) 2 + E(X) 2 (2.6) 
= E(X 2 ) - E(X) 2 (2.7) 



5 Now a has nothing to do with a-algebras presented in chapter 1. We hope that this 
common notation will not be confusing. In general it is not. 
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Example 50 Suppose card(Q) = 4, P(u) = 0.25 for every oo, and X defined 
by: 

2 




E(X) = 1 and the corresponding zero-mean variable Y = X — E(X) is 
equal to: 

1 




The variance of X and Y are equal and given by: 

V(X) = V(Y) = 0, 25 x (l 2 + 2 2 + (-2) 2 + (-1) 2 ) = 2.5 

This example also shows that variance is invariant by translation. In 
other words: 

V(X) = V(X + c) (2.8) 

for any real number c. 

Remark 51 When a sample (x%, x%, x n ) of a random variable X is used 
in empirical studies, an unbiased estimate of the variance is given by: 

1 n 

s2 = ^t5>«-^) 2 ( 2 - 9 ) 

i=l 

The coefficient n — 1 instead of n comes from the fact that the expectation 
of X is not known and replaced by its estimate x. 

Definition 52 Let X be a square-integrable variable; the standard devia- 
tion of X, denoted cr(X), is the square root ofV(X): 

a{X) = ^V(X) 

In Markowitz portfolio theory, agents are assumed to make their choices 
in the expectation-variance space or in the expectation-standard deviation 
space. For a given expected return, they minimize the variance (standard 
deviation) of their portfolio return. However, a portfolio contains several 
assets and the relationships between the individual returns must be taken into 
account to evaluate the variance of the portfolio 6 . These relationships are 
quantitatively measured by covariances and correlations. They are developed 
in section 2.3. 

6 To give an idea, the standard deviation of yearly U.S stock returns was around 20% 
over the 20th century. 
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2.2.2 Skewness and kurtosis 

Variance is commonly used as a measure of risk in portfolio management. 
However, variance weighs in the same way returns above the mean and re- 
turns below the mean. Therefore, measuring risk by the variance of returns 
implicitly assumes that the distribution of returns is symmetrical with respect 
to the mean. In fact, it is intuitive that agents link risk more to potential 
losses than to potential gains. Consequently, when the distribution is not 
symmetrical, variance may not be a good measure of risk. Skewness is a 
way to measure the assymetry of a probability distribution and many em- 
pirical studies show that stock returns are negatively skewed, meaning that 
that they exhibit more large negative returns than large positive returns and 
more small positive returns than small negative returns. 

In the next chapter, we will describe the main probability distributions 
appearing in financial models. The most commonly used to describe returns 
is the Gaussian distribution which is symmetrical with respect to its mean. 
However, when looking at stock returns, we also observe that extreme returns 
are more frequent than what is expected under the Gaussian distribution. 
Kurtosis is the standard measure to take into account these extreme returns. 
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Definition 53 The moment of order n of a random variable X, denoted 
as fJ, n (X) is the quantity (if it exists) defined by: 

» n (X) = E(X n ) 

Definition 54 Let X denote a random variable with a finite moment of 
order 3. The skewness of X, denoted Sk(X) is defined by: 

Sk(x) = ( , 10) 

In empirical studies using a sample (xi,X2, ■ ■■■,x n ) of a random variable 
X, Sk(X) is estimated by: 

/ — \ 3 
n \ -> { x* — x 




Figure 2.1: Histogram of 2002 PEPSI daily returns 

Figure 2.1 shows the histogram of daily returns on the PEPSI stock in 
2002. We observe some largely negative returns and Sk = —0.73 this year. 
Skewness is used in some portfolio choice models to take into account the 
assymetric perception of risk by investors. The thin line on figure 2.1 shows 
what we could expect if the distribution of returns were Gaussian (see chapter 
3), that is symmetrical. 
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Definition 55 The kurtosis of a random variable X is defined by: 

k(X) = totX-JW) (2.12) 

The excess kurtosis of a random variable X is defined by: 

en{X) = K{X)-3 (2.13) 

In case of Gaussian returns, k = 3. It explains the way excess kurtosis 
is defined. As mentioned before, k(X) measures the importance of extreme 
returns in a probability distribution. In the case of PEPSI returns, k(X) = 
8.93 which is very high with respect to what is expected for Gaussian returns. 
It is common for single stocks to observe a large kurtosis. When portfolios 
are considered, the diversification effect often generates returns with a lower 
kurtosis and a skewness closer to 0. 

Before studying relationships between random variables by means of co- 
variances and correlations, we give several important properties of the vector 
space of integrable/square-integrable random variables. Even if they seem 
abstract, they are very useful in general arbitrage pricing models. 



2.3 The vector space of random variables 

Let £°(Q,A) denote the set of real random variables defined on (Q,A). 
Addition of variables and multiplication by a real number may be intuitively 
defined by: 

Vu e Q, (X + Y) (u) = X(u) + Y(u) 
Vw G 0,VceR, (cl)(w) = cl(w) 

It is quite obvious that £° (f2, A) is a vector space, the null random vari- 
able being the neutral element for addition. This space is very general and it 
is impossible to enrich its structure without constraints. But, if we restrict 
the analysis to integrable variables, a norm can be defined on this subspace. 
Moreover, considering only square-integrable random variables allows to de- 
fine an inner product, inducing nice geometrical properties. 

It is worth to notice that integrable or square-integrable variables can be 
considered only if a probability measure P has been specified. Some technical 
precautions are needed to introduce P and to consider random variables as 
elements of vector spaces. They are presented in the next sub-section. 
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2.3.1 Almost surely equal random variables 

When we say that two n-dimensional vectors in M n , say x and y, are equal, 
there is no ambiguity It means that the distance d(x, y) between x and y is 
zero, as soon as a metric 7 d has been defined on W 1 . Using the usual metric 
on R ra , it means: 

n 

x = y ^J2^-y^ 2 = 0 ( 2 - 14 ) 

i=l 

or, equivalently, X{ = y^ for i = 1, n. 

Consider now the set of Riemann-integrable functions defined on an in- 
terval [a; b] . The usual metric on this space is defined by: 

d(f,g)= [ b \f(x)-g(x)\dx (2.15) 

J a 

In fact, it is not really a metric because we can have d(f,g) = 0 with 
/ ^ g\ Assume f(x) = 0 on [a; b] and g(x) = 0 on [a; b[ but g{b) = 1. It 

turns out that d(f, g) = 0 because / and g only differ on a set containing 
one point. 

It would be the same if / and g were different on a finite or countable set 
of points. To deal with a "real" metric, it is necessary to avoid these cases. 
The problem is solved by defining a binary relation 1Z as follows: 

flZg if / and g are equal, except on a finite or countable set of points. 

1Z is an equivalence relation. It is reflexive (fTZf), symmetric (fTZg 
gTZf) and transitive (f1Zg and glZh =3- flZh). 

Consequently, we do not define the metric d on the set of integrable 
functions, but on the ^-equivalence classes of integrable functions defined 
on [a; b] . If / and g stand for two equivalence classes containing respectively 
/ and g, the metric d(f, g) can be defined by using formula ??, for any pair 
(f,g) of functions belonging to / x g. 

The same "trick" is used on the space of integrable random variables, 
using the equivalence relation "almost-surely equal". 

7 A metric d on a space S is a mapping from S x S to M + such that 1) d(x,y) = 0 
iff x = y 2) d(x,y) = d(y,x) and 3) d(x, z) < d(x,y) + d{y 1 z) (called the triangular 
inequality) . 
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Definition 56 Two random variables X and Y defined on (O, A, P) are 
equal P- almost- surely (or P -almost everywhere) 9, if: 

P(Luett I X(w) = Y(u)) = 1 

We could also write in short: X = Y a.s P(X = Y) = 1. 

An alternative presentation can be used, based on negligible events. 

Definition 57 Let (Q, A, P) a probability space; A G A is P -negligible if 
P(A) =0. 

Definition 56 is then equivalent to say that two variables are a.s equal if 
they differ on a negligible event. 

£ 1 (0,^4, P) will denote the set of P-integrable random variables defined 
on (tt,A,P). 

Proposition 58 The binary relation 1Z defined on A, P) by: 

XTZY ^X = Y P-a.s 

is an equivalence relation. 



8 We note briefly P-a.s or P-a.e or simply a.e or a.s if no confusion can arise about the 
probability used in the model. 
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2.3.2 The space L 1 ^, A, P) 

Let now L 1 (fi, A, P) 9 be the set of ^-equivalence classes of random variables 
in A, P). In the same way, L°(Q, A, P) denotes the set of ^-equivalence 
classes of random variables in £°(Q,A). 



Proposition 59 1) L l (Vt, A, P) is a vector subspace of L°(£1,A, P). 

2) The mapping from L 1 (fi, A, P) to M + , denoted X — > and defined 

by: 

X^WXW^EQXl) 

is a norm 10 . 

3) The mapping from L 1 to E which links X to E(X), denoted X — > 
E(X), is a positive linear mapping. 

Proof. 1) The fact that L 1 ^, A, P) is a vector subspace of L° (fl,A,P) is 
a direct consequence of points (4) and (5) in proposition 42. 

2) We now prove that X — > {{X^ is a norm. ||-X"||, = 0 X = 0 P-a.s. 
It is then sufficient to show that: 

P" + *li < ll^lli + ll^lli 

But for any u G Q, we have \X(u) + Y(u)\ < \X(u)\ + \Y(u)\ , therefore: 

E(\X + Y\)<E(\X\)+E(\Y\) 

It is also obvious to see that \\aX || 2 = \a\ \\X lousing the properties of the 
absolute value function. 

3) Linearity and positivity of the expectation come directly from propo- 
sition 42. ■ 

Linearity of expectations is essential for economic interpretations. In 
fact, a large part of the financial literature on arbitrage uses, as we will see 
later on, a probability change to express prices as expectations of discounted 

9 or simply L 1 if no confusion can arise about the probability measure. 
10 A norm defined on a vector space S is a mapping from S to R + , denoted ||.||, satisfying: 

a) ||x|| = 0 if and only if x = 0 

b) Vie € 5, Vc € R, ||ca:|| = |c| ||x|| 

c) V(x,y)£SxS, \\x + y\\ < \\x\\ + \\y\\ 
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future cash-flows. In this framework, the linearity property simply says that 
the value of a portfolio is equal to the sum of the values of the securities 
contained in the portfolio. It is also the intuitive idea behind the no-arbitrage 
assumption. You cannot buy two stocks for $50 and immediately resell them 
$30 each. The price of one stock must be $25 if there are no arbitrage 
opportunities. 

Proposition 59 allows to consider an integrable random variable as a vec- 
tor (element of the vector space L 1 (fi, A, P)). As a norm induces a metric by 
di(X, Y) = \\X — Y\\ t , L 1 (f2, A, P) is also a metric space as any normed vec- 
tor space. Convergence associated to the metric d\ is called L 1 convergence 
or convergence in mean. 

Definition 60 A sequence of random variables (X n ,n G N*) converges in 
L 1 to a limit X e L 1 if and only if: 

lim E(\X n -X\) = 0 

We then write X n — » X. 

Unfortunately, the Li-norm is not deduced from an inner product. Some 
intuitive geometrical results, valid in finite-dimensional spaces like M n , are 
not true in L%. It is especially the case for the projection theorem and the 
Riesz representation theorem. To keep these convenient properties true, it is 
neccessary to restrain the space to square-integrable variables. 



2.3.3 The space L 2 (Q, A, P) 

Let now £ 2 (Q, A, P) denote the vector space of square-integrable random 
variables and L 2 (fl, A, P) the space of equivalence classes for the binary 
relation "almost- surely equal" defined on £ 2 (Q,A, P). We get the following 
proposition. 

Proposition 61 1) L 2 (il, A, P) is a vector subspace of L 1 (fi, A, P) 

2) Let X and Y be 2 elements of L 2 (Q,A, P) ; the product XY is in 
L^Q, A, P). 
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Proof. 1) It is a direct consequence of: 

a) the linearity of the expectation operator; 

b) the fact that a square-integrable variable is also integrable. 

2) We are going to show that 

E(XY) 2 < E(X 2 )E(Y 2 ) (2.16) 

Let Z = X + tY with t e R : 

E (Z 2 ) = E(X 2 + 2tXY + t 2 Y 2 ) > 0 

= E (X 2 ) + 2tE (XY) + t 2 E (Y 2 ) 

The second line is a polynomial of degree 2 in t. It is always positive or equal 
to 0, therefore its reduced discriminant A' is negative or equal to 0. But A' 
is equal to: 

A' = E (XY) 2 - E (X 2 ) E (Y 2 ) 

It shows inequality 2.16. The RHS of this inequality is finite since X and 
Y belong to L 2 . It implies that the LHS is also finite and shows that XY is 
integrable. ■ 

This proposition allows to define an inner product on L 2 . 
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Proposition 62 The mapping from L 2 xL 2 toM. denoted by (., .) and defined 
by: 

(X,Y) ^ (X,Y) = E(XY) (2.17) 

is an inner product on L 2 . 

The induced norm is defined by: 

\\X\\ 2 = yJ(X ) X) = ^E{X 2 ) (2.18) 
and the induced metric d 2 is defined by d 2 (X, Y) = \\X — Y\\ 2 . 

Proof. The mapping (., .) is positive because (X, X) = E(X 2 ) > 0 if X is 
not (P-a.s) equal to 0. Bilinearity is obvious since the expectation operator 
is linear. ■ 

As it was the case for L 1 , we can define a convergence on L 2 simply called 
L 2 -convergence or convergence in quadratic mean. 

Definition 63 A sequence (X n , n G N*) converges in L 2 with a limit X G L 2 
if and only if : 

lim E \{X n - X) 2 } = 0 

L 2 is in fact a Hilbert space 11 . At a first glance this distinction appears 
purely technical but it has important consequences on the properties of vec- 
tors in L 2 . There are two important well known theorems, valid on W 1 , that 
are still true on Hilbert spaces and especially on L 2 . These are the projection 
theorem and the Riesz representation theorem. 

The Riesz representation theorem 

First remember that, on R 2 , a linear mapping / : M 2 — > K is defined by: 

Vx eR 2 ,f{x) = a 1 x 1 + a 2 x 2 (2.19) 

where a\ and a 2 are real numbers and x' = (xi, x 2 ).It means that the numbers 
(ai,a 2 ) represent the mapping /. Moreover, the vector a' = (ai,a 2 ) has the 
same dimension than the vector x G M 2 and f(x) can be written as the inner 

11 A Hilbert space H is a normed vector space where the norm comes from an inner 
product, and the metric induced by the norm makes H complete as a metric space (any 
Cauchy sequence in H is convergent). 
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product 12 of a and x. To summarize, we could say that any mapping / from 
M 2 to K is represented by a vector a G IR 2 . It is the Riesz representation 
theorem in the two-dimensional space. Fortunately, this result is still valid 
in L 2 (tt,A, P). 

Theorem 64 Let f denote a continuous linear mapping from L 2 to R; there 
exists Yf G L 2 such that for any X G L 2 : 

f(X) = (X,Y f ) = E(XY f ) 

Suppose that X is the payoff of a financial asset and f(X) denotes the 
initial price of this asset. The mapping X — > f(X) is called a valuation 
operator or a pricing kernel. Assume for example that Card(Q) = N. The 
preceding theorem says that there exists a vector Yf such that 

N 

f(X) = (X, Y f ) = E(XYf) = X^Yf^P^) (2.20) 

i=l 

Consider the simplest case where X = e\ = the Arrow-Debreu security 

contingent on Ui. 

f( ei ) = (e h Y f ) = P(ui)Y f (u>i) (2.21) 

/(ej) is the market price at date 0 of a security which pays 1 at date 1 
if the true state of nature is Wj. It is equal to the product of the probability 
P{ujj) and Yfioji). In fact Yf{ujj) depends on two elements. First, the risk-free 
rate because you pay today the price /(ej) to receive the contingent payoff 
at a future date. The second element influencing Yf(oJi) is risk aversion. If 
you are highly risk-averse, Yf(ui) will be lower because you are not ready to 
pay much to receive the contingent payoff. 

Consequently, Yf(ui) is interpreted as the risk-adjusted discount factor 
for the state uj{. But any general financial asset X, defined by its payoffs, is 
a portfolio of such Arrow-Debreu securities. 

N 

X = 22 X i e i 
i=l 

with X(ui) = x\. It follows that: 

N N 

f(X) = (X, Yf) = UK*) = E atf/MPO*) ( 2 - 22 ) 

i=l i=l 

12 Remember that on W l , the usual inner product of two vectors x and y is defined by 
<x,y>= x iVi- 

i — 1,. .71 
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The projection theorem 

The second important result in L 2 is the projection theorem. To introduce 
the result, consider one more time a vector x in R 2 and a convex 13 subset C C 
]R 2 . On figure 2.2, C is the grey ellipse, z is the point in C which is closest 
to x. In other words, z is the orthogonal projection of x on C. Consider any 
vector y whose extremity is in y. It is easy to see that the angle between 
x — z and y — z will be between 90° and 270°. The cosine of this angle is then 
negative. But the cosine of an angle between two vectors is proportional to 
the inner product of these vectors 14 . We then have: 

<x-z,y-z> <0 (2.23) 

Here the convexity hypothesis is fundamental because it allows to say 
that all points in C are on the same side of the tangent to C (on which lies 

z). 



13 A subset A of a vector space is convex if: 
VA e [0; 1] , V(x, y) e A x A, Xx + (1 - X)y € A. 

14 In R™, the cosine of two vectors x and y is defined by < x, y > / \\x\\ . \\y\\ . 



WHAT'S MISSING IN THIS EQUATION? 



_You could be one of our future talents 



MAERSK INTERNATIONAL TECHNOLOGY & SCIENCE PROGRAMME 

Are you about to graduate as an engineer or geoscientist? Or have you already graduated? 
If so, there may be an exciting future for you with A.P. Moller - Maersk. 



www. maersk. com/mitas 




E3 



MAERSK 



Download free ebooks at bookboon.com 



58 



Probability for Finance 



Moments of a random variable 




Figure 2.2: Projection theorem in R 2 

This result is still valid for square-integrable random variables and also 
called the projection theorem. 

Theorem 65 Let C denote a non-empty convex subset of L 2 and X G L 2 . 
There exists Z G C such that: 

{X - Z,Y - Z) < 0 for anyY EC 

Z is the orthogonal projection of X on C. In chapter 4, we will intro- 
duce conditional expectations and show that they can also be interpreted in 
terms of orthogonal projections. 

2.3.4 Covariance and correlation 

An economic agent investing in a portfolio of financial securities has to take 
into account the relationships between assets' returns to make an optimal 
choice. For example, a portfolio containing several stocks of firms in a given 
industrial sector is more sensitive to news concerning this specific sector than 
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a portfolio diversified across industries. Covariance is used to measure these 
relationships between returns. 

Definition 66 Let X and Y denote two random variables in L 2 (Q,A, P); 
the covariance between X and Y, denoted Cov(X,Y) (or simply o~xy) is 
defined as: 

cov(X, Y) = E [{X - E(X)) (Y - E(Y))] 
Example 67 Let X and Y be defined on a Estate space as in table 2.1: 



State 


X(u) 


Y(u) 




1 


3 




0 


1 




3 


1 


U4 


4 


3 



Table 2.1: Definition of X and Y 

We assume that all states are equally likely, so E(X) = E{Y) = 2. The 
corresponding centered variables are given in table 2.2. 



State 


X(cu) - E(X) 


Y(lo) - E(Y) 




-1 


1 


u 2 


-2 


-1 




1 


-1 


W 4 


2 


1 



Table 2.2: Definition of the centered variables 



The covariance is then calculated as follows: 

cov{X, Y) — - (—1 x 1 + (-2) x (-1) + 1 x (-1) + 2 x 1) = 0.5 

This quantity depends, like expectation and variance, on the probability 
measure P. Covariance gives a quantitative measure of the (linear) rela- 
tionship between X and Y. Moreover, covariance is bilinear; in other words, 
for any quadruple a, b, c, d of real numbers and any quadruple of square- 
integrable random variables X, Y, Z, W we have: 

Cov(aX + bY, cZ + dW) = acxo-xz + adxaxw + bcxa Y z + bdxa Y w (2.24) 
It means in particular that Cov(aX, Y) = aCov(X, Y). 
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Remark 68 Relation 2.24 a ^ so allows to write: 

V (X + Y) = V (X) + V (Y) + 2Cov (X, Y) 
because Cov (X, X) = V(X). 

The magnitude of Cov (X, Y) depends on the magnitude of the values of 
X and Y. It is then difficult to compare two covariances and to give an eco- 
nomic interpretation in terms of the intensity of the relationship linking two 
variables. To overcome this problem, we refer to the correlation coefficient 
which uses standardized variables. 



Definition 69 Let X and Y denote two variables in L 2 ; the correlation 

coefficient between X and Y , denoted as p X yi ^ s defined as: 

_ Cov(X, Y) 
PxY - a(X)a(Y) 

where cr(X) and o~(Y) are the standard deviations of X and Y. 

p XY can also be written Cov (jT^y, ^y)? which is the covariance of two 

variables with unit variance. It is the meaning we give to the word "stan- 
dardization". In fact, when X and Y are zero-mean random variables, we 
get: 



PXY 



\ x h W Y \\2 



Here, p XY may be interpreted as the cosine of the angle between the two vec- 
tors X and Y. It appears that the length of the vectors, which is in fact their 
standard deviation, is neutralized when measuring a correlation coefficient. 
The reader must be warned about the fact that a high correlation means a 
strong linear relationship between two variables, but a low correlation does 
not always mean a weak relationship. It only means that a strong linear 
relationship does not exist. 

In the example of table 2.1, we get: 



ai/X) = \l \ ((-1)2 + (-2)2 + (1)2 + (2)2) = V ^5 = U 



a(Y) = ((1)2 + (-1)2 + (-1)2 + (1)2) = 1 

0.5 

Pxy = T^ = 0 - 316 
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The correlation coefficient is positive but far from 1. We can remark 
that if X and Y are multiplied by a constant, the correlation coefficient does 
not change, even if covariance is different. The essential properties of p are 
summarized in the next proposition. 



Proposition 70 Let X and Z denote two random variables in L 2 and a, b, c, d 
four real numbers: 

Cov(aX + b,cZ + d) = ac x Cov(X, Z) 
Pax+b,cz+d = sign(ac) x p xz 

Proof. The first equality comes directly from equality 2.24 by choosing Y 
and W identically equal to 1. The second equality is obtained by noting that 
a(aX + b) = \a\ a(X) and a(cY + d) = \c\ a(Y). m 

An important property of correlations (and covariances) is linked to in- 
dependence . 
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Proposition 71 If two variables X and Y in L 2 are independent, their cor- 
relation (covariance) is equal to zero. 

It is important to note that the proposition is an implication, not an 
equivalence. We could build counter examples of uncorrelated but non inde- 
pendent variables. 



2.4 Equivalent probabilities and Radon- Nikodym 
derivatives 

2.4.1 Intuition 

To buy a financial asset, risk averse economic agents are not ready to pay the 
expected present value of future cash-flows discounted at the risk-free rate. 
Considering the randomness of cash-flows, they require a risk premium and 
are ready to pay only a lower amount. 

To translate this problem in simple terms using lotteries and keeping 
the risk-free rate equal to 0, we use the following example. Let X% denote 
the terminal payoff of a lottery, X\ being 200 or 0 with equal probabilities. 
Assume that a risk averse agent is ready to pay only X 0 = 90 to play the 
lottery knowing that 

E (Xi) = - [0 + 200] = 100 (2.25) 

In financial theory, there are two ways to link the terminal expected payoff 
to the initial price. In a CAPM 15 -like approach, the expectation is discounted 
by a risk premium such that 

X 0 = ^ R E k iXl) . = 90 (2.26) 
1 + Risk _premium 

The difficulty of this approach is obviously the determination of the risk 
premium. If X 0 is the equilibrium market price, the risk premium is a com- 
plicated function of the utility functions of all agents. 

The other approach, mainly used to value derivative securities, is to dis- 
count future cash-flows at the risk-free rate (equal to 0 in our example) but 
to change the probabilities coming in the calculation of the expectation. 



15 CAPM = Capital Asset Pricing Model 
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In the above example, we are looking for an alternative probability mea- 
sure Q (different from P) such that 

X 0 = E Q {Xi) (2.27) 

The probabilistic framework is very simple since there are only two states 
of nature. In fact, Q = {uji,uj 2 } , = 200 and Xi(u 2 ) = 0. 

Let Q be defined by: 

Q(ui) = qi= 0.45 

Q(u) 2 ) = q% = 1 — qi = 0.55 

It is obvious that we get the desired result Eq (Xl) = 90 = X 0 . The 
probability Q is easily obtained because we have only to solve a system of 
two equations with two unknowns: 

qi x 200 + q 2 x 0 = 90 (2.28) 
qi + q 2 = 1 (2.29) 

This technique is now very common in finance, especially in the valuation 
of options. However, the idea seems artificial because one can have the feeling 
that Q depends on the risky asset considered in the calculation. Nevertheless 
it is very powerful when associated with the no arbitrage assumption, as 
shown in the following example. 

Example 72 Consider a two-state one-period economy with one risk-free 
asset (the risk-free rate is still assumed to be 0) and two risky assets defined 
by: 

Xi(loi) = 200 Xi(co 2 ) = 100 X 0 = 130 (2.30) 
Fi(^i) = 150 Yi(u 2 ) = 110 Y 0 = 120 (2.31) 

We first look for a probability Q such that X 0 = Eq(Xi). We have to solve: 

130 = 200Q(w 1 ) + 100 (1-Q (wi)) 

The solution is Q(u)i) = 0.3. 

Look now for a probability Q' such that Y 0 = EQ/{Yi).The equation to be 
solved is: 

150QVi) + no (1 - Q' M) = 120 
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and the solution is Q'(uj\) = 0.25. 

Q and Q' are different. It then seems that changing the probability mea- 
sure is useless since we get a different probability measure, depending on the 
risky asset chosen for the calculation. 

Fortunately, it is not the case if there are no arbitrage opportunities. An 
arbitrage opportunity is a portfolio which costs nothing at date 0 (or has a 
negative cost) and pays positive (at least zero) amounts in all states of nature 
at date 1. If such opportunities exist, current prices cannot be equilibrium 
prices. 

We show hereafter that if Q and Q' are different, there must be an arbi- 
trage opportunity or, equivalently, (Xq, Y 0 ) cannot be a vector of equilibrium 
prices. 

Let us build a zero-cost portfolio 9 satisfying : 



' 200 " 


+ e Y 


' 150 " 


+ 0z 


' 1 " 




" 0 " 


100 


110 


1 




0 



where 9 Z is the number of units of the risk-free asset whose price and pay- 
offs are equal to 1 (because the risk-free rate is zero). There are infinitely 
many portfolios satisfying these equations. The question is to know if it is 
possible to find such a portfolio with a negative cost (it would be an arbitrage 
opportunity) . 

Consider 6x = —2; By = 5;9z = —350; we get : 



-2 



" 200 " 




" 150 " 


-350 


" 1 " 




" 0 " 


+ 5 






100 




110 


1 




0 



The cost of this portfolio is given by: —2 x 130 + 5 x 120 — 350 = —10 

It happens that an agent characterized by a strictly increasing utility func- 
tion is ready to "buy" (but the price is negative!) an infinite quantity of this 
portfolio. Consequently (X 0 ,Y 0 ) cannot be a vector of equilibrium prices. 

As the arbitrage portfolio requires a short position on X\ and a long po- 
sition on Yi, the price X 0 is too high and Y 0 is too low (in relative terms). 
To get an equilibrium, it is necessary that — 2X 0 + 5Y 0 = 350. As long as this 
equality is not satisfied, it is possible to build an arbitrage portfolio. To keep 
things simple, suppose that the price adjustment is realized only on Y\. The 
equilibrium price is then Y 0 = 122. 
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Recalculate now the probability Q" with this new price. 

122 = 150Q" (wi) + 110 (1 - Q" (wi)) 

implies Q n {u{) = ^§ = 0.3. 

A kind of miracle appears in this example! Q" is exactly the probability 
measure Q when the market is free from arbitrage opportunities. In other 
words, the probability we build does not depend on the security we use 
to calculate it, if prices are equilibrium prices. Q characterizes the whole 
market, not a specific security. It is the reason why the change of probability 
technique is so powerful for valuing derivative securities. 

We let the reader check that when the risk-free rate r is not zero, that 
is when the price of the risk-free asset is j^, the price of a contract X\ is 
written; 

X 0 = ^1-E Q {X 1 ) (2.32) 
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The economic interpretation of this formula is also very interesting. If the 
price of any asset is obtained by discounting the expected cash-flows at the 
risk-free rate, it is as if agents were risk-neutral because they don't require 
any risk premium. It explains why the probability measure Q is usually 
labelled "risk-neutral probability measure". 

2.4.2 Radon Nikodym derivatives 

To justify the not so intuitive assumptions of the current section, we mention 
an important point relative to the prices of Arrow-Debreu securities. An 
Arrow-Debreu security contingent on a state u is a financial asset which 
pays one monetary unit if u occurs and nothing if another state occurs. It 
can be written as the indicator function 1^} . 

Consider an Arrow-Debreu security contingent on a state u±, denoted as 
A\, in a one-period model with a zero risk- free rate. If P(oox) > 0, the initial 
price v4q of this security is strictly positive but possibly lower than P(u{), 
due to risk aversion. When changing the probability to write A\ = Eg (A\) , 
one remarks that Eg (A\) = Q{oj\). Consequently, the probability change 
works (on the economic point of view) only if events having a positive (zero) 
probability under P also have a positive (zero) probability under Q. The 
reason is that the price of an Arrow-Debreu security is positive (zero) when 
the probability of getting one monetary unit at the terminal date is positive 
(zero). 

This remark justifies the definitions hereafter. 

Definition 73 a) A probability measure Q is absolutely continuous with re- 
spect to (w.r.t) P if: 

MB e A, P(B) = 0 => Q(B) = 0 

and we note Q « P. 

b) Two probability measures P and Q are equivalent if: 

VB e A, P(B) = 0 & Q(B) = 0 
Point (b) also means that Q « P and P « Q. 

The essential result allowing to use the change of probability method in 
a rigorous way is the Radon- Nikodym theorem. 
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Proposition 74 Q « P if and only if there exists a positive A-measurable 
function (ft such that: 

MB e A, Q{B) = [ 4>dP 

J B 

Remarks : It is clear that P (B) = 0 => Q(B) = 0 since the integral 
is 0. Moreover, <f>, which is a random variable, is a.s strictly positive. If we 
note that Q(B) = J B <j)dP, it allows to denote 0 = by analogy with usual 
differential calculus. <fi is then called the Radon-Nikodym derivative of Q 
with respect to P. Finally, if P and Q are equivalent, ^ and ^ exist and: 

dQ = dP 
dP 1 dQ 



Proposition 75 Let P and Q two equivalent probability measures on (fl, A) 



and 4> = ^p- We have the following equality: 

E Q (X 1 )=E{ ( j>X 1 ) 

In the abovementioned financial interpretation involving a simple lottery 
Xi, Eq (Xi) is the price of X\. It can also be expressed as the expectation 
under P of a transformed payoff 16 <j>X\. We can also write E (<j)Xi) = ((f), X±) 
where the inner product is the one defined in L 2 (O, A,P).<j) then represents 
(in the sense of the Riesz theorem) the linear valuation operator. 

Specific case : Assume Card{VL) = N, A = P(Q) and P(u) > 0 for any 
u; each state of nature is an event and we get: 



Q(M)= / 4>dP = (/>(u)P(u) 

is then defined by: 

Q(u) 



(f>(cj) 



P(co) 



16 In the usual microeconomic one-period model, (j> is proportional to the ratio of the 
date-1 marginal utility of consumption and of the date-0 marginal utility of consumption 
(see Dothan, 1990). 
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2.5 Random vectors 
2.5.1 Definitions 

In portfolio management, it is common to deal with a large number of stocks 
whose returns are random variables. It is then more efficient to use vector 
notations and matrices to present the calculations. 



Definition 76 A n- dimensional random vector is a random variable de- 
fined on (Q,A, P) and taking values in (M. n , B^n) . We then write X = 
(Xi, ....,X n )' where 17 the X± are real random variables. 

The joint distribution of a random vector is defined by its cumulative 
distribution function or its density (when it is relevant). 



17 The "prime" denotes transposition as usual. Without other indications, vectors are 
column- vectors . 
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Definition 77 a) The CDF of a vector X = ....,X n )' is the function 
F x from W 1 to [0; 1] defined by: 

F x (x) = P(n r l =1 {Xi<Xi}) 

where x el" is equal to (xi,X2--,x n ) . 

b) If the Xi are continuous random variables, the joint density of X is a 
positive function fx from W 1 to M satisfying : 

/ ••• / fx{x)dx x ...dx n 

-oo J —oo J —oo 



The vocabulary defined for random variables is still valid for random 
vectors; in particular a random vector is integrable or square integrable if all 
its components satisfy this property. In this case, E(X) denotes the vector 
of expectations of the X, and Vj stands for the covariance matrix defined 
by: 

V{X X ) ... Cov(X 1 ,X j ) Cov(X 1: X n ) 



V 



x 



V(Xj 



Cov(X j ,X 1 ) 

Cov(X n ,X 1 ) V{X n ) 
A simplified and common notation for Vx is the following: 



V 



x 



0~nl 



a ij a lr , 



Using random vectors is especially interesting because the rules governing 
matrix calculations can be used. The following proposition summarizes these 
essential calculation rules. 



Proposition 78 Let X denote a square-integrable n-dimensional random 
vector and U, W denote two n-dimensional vectors in M. n . 

1) E(U'X) = U'E(X) 

2) E (U'X, W'X) = U'E(XX')W 

3) V{U'X) = U'V X U 

4 ) CoV (U'X, W'X) = U'Y X W 
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These notations may appear confusing at a first glance because U'X = 
Y^i=\ UiXi is a random variable, so V(U'X) is a number and Vj is a (n, n) 
matrix. Moreover, XX' is a n x n matrix so E(XX') is also an x n matrix 
with generic element E(XiXj). It is important for a student in finance to be 

comfortable with these notations because they are very common in portfolio 
choice problems. 

2.5.2 Application to portfolio choice 

Consider a financial market with n traded stocks; X denotes the vector of 
returns and U G K™ stands for the vector of proportions invested in the n 
stocks. The random return of portfolio U, denoted R, can be written as: 

n 

R = U'X = U i X i 

i=l 

By proposition 78, the expected return and the variance of the portfolio 
return are: 

E(R) = U'E(X) 
V(R) = U'V X U 

Assume that E{X) has at least two components which differ in value. If 
it was not the case, all portfolios would have the same expected return and 
the problem would be trivial. 

For U to be a portfolio, it is necessary that: 

n 
i=l 

which may be written U'l =1 where 1 is a vector in M. n whose components 
are all equal to 1. 

The standard portfolio choice problem consists in finding the minimal 
variance of return, constrained by a given level of expected return, say e. 
Vx is assumed positive definite 18 , meaning that it is not possible to build a 
risk-free portfolio with risky assets. It is not really a restrictive assumption 
because a risk-free asset can be separately introduced in the model. 

18 A (n, n) matrix M is positive definite if anf only if Vx G K™, x ^ 0 <^=> x'Mx > 0. 
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The optimization problem is then written: 

min -U'V X U 
2 

with the constraints 

U'E(X) = e 
U'l =1 

The coefficient | does not change the optimal solution. It is purely con- 
ventional and avoids to keep coefficients 2 in the expression of the derivative. 

The Lagrangian is given by: 

£ (U, A, n) = ^U'V X U + A (e - U'E(X)) + /i (1 - U'l) 

To simplify notations in the following, let us denote Vj = V and E(X) = 
r; the first-order conditions of the problem are: 

dC 

= W-Ar-/il = 0 

oU 

= e-U'r = 0 

OA 

^ = 1-^1 = 0 

As V is invertible, the first condition gives: 

U = AV^r+ziV- 1 ! 

Using the two other conditions leads to: 

e = Ar'V^r+zir'V" 1 ! 
1 = Al'V^r+zil'V- 1 ! 

After a few calculations, we get: 

U = — [(eC - A)V- 1 r+(6 - eA)^- 1 !} 



where 



A = r'V-H 
B = r'V _1 r 
C = 1V _1 1 

D = BC-A 2 

As an exercise, the reader can check that the mapping x — > x 
defines a norm on M. n induced by the inner product (x, y) = x'V^ 1 ?/. Deduce 
from this result that D is strictly positive. 
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Chapter 3 

Usual probability distributions 
in financial models 

This chapter covers only the most popular probability distributions encoun- 
tered in financial models, and especially in finance textbooks. Obviously, it 
is not a complete tour of probability distributions and, even in finance, many 
others can be found in scientific papers in the field. However, the few distri- 
butions developed hereafter largely cover most of what appeared in financial 
models in the 50 last years. It is then a good starting point for students in 
finance. 

This chapter is divided in two parts. The first one describes the prop- 
erties of discrete distributions, essential the Bernoulli, binomial and Poisson 
distributions. The second part deals with the most common continuous dis- 
tributions, namely the uniform, Gaussian and Log-normal distributions. At 
the end, we also shortly present some other useful distributions appearing 
in statistical tests. These are the x 2 , the Student-t and the Fisher- Snedecor 
distributions. They are deduced from the Gaussian distribution. 

3.1 Discrete distributions 

3.1.1 Bernoulli distribution 

Definition and example 

The most simple probability distribution is the so called Bernoulli distribu- 
tion. 

Definition 79 A random variable X follows a Bernoulli distribution with 
parameter p if X takes values 1 and 0 with probabilities p and 1 — p. 
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We can observe that if B e A and P(B) = p, the indicator function of B 
follows a Bernoulli distribution with parameter p. It is denoted as 

1b ~ B(p) (3.1) 

As a natural extension, any variable taking two different values is often called 
a Bernoulli distribution. In fact, if X takes values a and b (a > b) with 
probabilities p and 1 — p, the variable Y = ~zr{X — b) takes values 1 and 0 
with probabilities p and 1—p. Y is then 23 (p). The use of this distribution in 
finance is, in many cases, pedagogical. For example, in a one-period model, 
it is convenient to modelize variations of the logarithm of a stock price by a 
Bernoulli distribution. In this case, the two possible values are denoted ln(w) 
and ln(oQ (u for up and d for down). If the initial (date-0) price is denoted 
So, the date-1 price, denoted Si, takes two possible values uS 0 and dS 0 . We 
observe that : 

ln(5i) = ln(5 0 )+X 

where X is a Bernoulli variable taking values ln(-u) and ln(d). This simple 
model, extended to the multi-period case, has given rise to the so called 
binomial model (see next section). 

In chapter 1, we described binary options (example 36) traded on the 
Chicago Board Options Exchange. These options pay $100 or 0 at the ma- 
turity date, depending on the occurrence of an event B = {SPt > K} where 
SPt is the value of the S&P500 index at the maturity date of the contract, T 
and K is the exercise price. Definition 79 shows that the payoff of this type 
of option follows a Bernoulli distribution. Obviously, we can expect that the 
price quoted at date 0 is a function of P(B). 

Expectation and variance 

Proposition 80 If X ~ B(p), then E(X) = p and a 2 (X) = p(l - p) 

Proof. If a random variable X follows a Bernoulli distribution with para- 
meter p, the expectation E(X) is given by : 

E{X) = pxl + (l-p)xO = p 

The variance of X, denoted as cr 2 (X) is obtained in the same way by 
using the formula 1 : 

a 2 (X) = E(X 2 ) - E(X) 2 =p-p 2 =p(l-p) 



Remark that a Bernoulli variable taking only values 0 and 1, it satisfies X = X 2 . 
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Example 81 Assume that a random variable Y takes two values y\ and y 2 
with probabilities p and (1 — p). Then, we get: 

a 2 (Y)=p(l-p)( yi -y 2 ) 2 (3.2) 

To show this equality, note that X = \ (Y — y 2 ) follows B{p). Writing 
Y = (yi — y 2 )X + y 2 allows to directly deduce: 

E(Y) = ( yi -y 2 )E(X)+y 2 =p yi + (l-p)y 2 (3.3) 
a 2 (Y) = (y 1 -y 2 f 0 - 2 (X)=p(l-p)(y 1 -y 2 ) 2 (3.4) 

In the specific case of the binomial model of stock prices, introduced in 
chapter 1 (example 9), and ifY denotes the logarithmic return of the stock, 
we have y\ = ln(-u) and y 2 = ln(d). Consequently, the variance of the stock 
return is: 

a 2 (Y)=p(l-p)\nQ 2 (3.5) 
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3.1.2 Binomial distribution 

Definition and example 

Stock returns in a one-period model were represented by a Bernoulli distribu- 
tion. Consider now a multi-period model and assume that successive returns 
are independent Bernoulli random variables. Implicitly, independence of suc- 
cessive returns refers to the efficient market hypothesis, meaning that all in- 
formation is instantaneaously reflected in prices. If, in addition, we assume 
that the parameters u and d are constant over time (constant volatility), the 
log-price variations on given horizons are driven by a binomial distribution 
defined as follows. 

Definition 82 A variable X follows a binomial distribution with parame- 
ters n and p if X is written as the sum of n independent Bernoulli variables 
Xi, i = 1, n, each of them following B(p). We then have: 

P(X = k) = Qp fc (l-p) n - fe 

where = k[ ^L k ^ is the number of combinations of k objects among n. The 
distribution of X is denoted B(n,p). 

The binomial distribution is very popular in finance because it is the 
foundation of the famous option valuation model developed by Cox-Ross- 
Rubinstein (1979). This model describes the evolution of a stock price 5" in 
discrete-time; if S t is the date-t stock price, the date-(t + 1) price is defined 
by: 

St+i = St x X t+ i 

where Xt+i takes values u and d with probabilities p and I— p. The variables 
X t are assumed independent. S t is then equivalently defined by : 

t 

S t = S 0 xl[X s 

s=l 

from which we get : 

in (f)=i>P« 

The left hand side (LHS) is the stock return between s = 0 and s = t and 
the right hand side (RHS) is the sum of t independent Bernoulli variables 
taking values ln(it) and ln(d) with probabilities p and 1 — p. 
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The probability distribution of In (St) is B(n,p) which implies : 

P (ln{S t ) = ln(So) + k x u) = Ujp fe (l - vf~ h 

where (^) is the binomial coefficient counting the number of price paths 
containing k up moves (and then t — k down moves). 

Expectation and variance 

The expectation and variance of the binomial distribution come immediately 
from the moments of the Bernoulli distribution. 

Proposition 83 If X ~ B(n,p) then E(X) = np and <J 2 (X) = np(l — p) 

Proof. The binomial distribution B(n,p) is defined as the sum of n inde- 
pendent and identically distributed variables (obeying B(p)). 

It follows immediately that if X ~ B(n,p): 

E(X) = np and (J 2 (X) = np{l — p) 

because expectations and variances of the n independent Bernoulli variables 
entering the binomial distribution can be added 2 . ■ 

In the abovementioned Cox- Ross- Rubinstein model, we get the moments 
of log returns on t periods of time as: 



E 



a 2 



In ( f 

Do 
Of) 



t (pin (u) + (1 -p)ln(d)) 



tp(l — p) In 



2 



These expressions put to light the advantage of logarithmic returns. The 
first two moments of returns are simply the sum of the same moments on 
subintervals. For example, the variance of weekly returns is the sum of the 
variances of daily returns in the week under consideration. This property is 
not true when returns are calculated linearly as St+1 s ~ St ■ 



2 Remember that independent variables are uncorrected. 
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3.1.3 Poisson distribution 

Definition 

This probability distribution is often used in insurance to describe the arrival 
of damages or in microstructure theory to modelize the flows of buy and sell 
orders on financial markets. 

Definition 84 A variable X follows a Poisson distribution with parame- 
ter X if X takes positive integer values and is defined by: 



We then note X ~ V(X). 

Figure 3.1 shows the CDF of V{2). It appears like a step-function because 
the Poisson distribution is discrete. 
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F(x) 
1.0 



— i i i i i_ 



CDF of the Poisson distribution 



Figure 3.1: CDF of V(2) 



Expectation and variance 



Proposition 85 If X ~ V(X) then E(X) = X and a 2 (X) = X 

Proof. The expectation is deduced from the definition of the exponential 
function as the sum of an infinite series, e x = ^2^=o 



OO X K 



kl ■ 

+oo +oo . f. +OO yk 

E(X) = J2kP(X = k) = J2kexp(-X)-=exp(-X)J2k^ 

k=0 k=0 ' ' k=l 

+°o yk-1 +oo yk 

= X exp(-A) 77 _ 1 x; = A ex P(-^) Yl T[ = X ex P( _A ) ex P<» = A 

k=l \ >' k=0 

To get the variance, the calculation is a little bit more involved: 

+oo , k 

a 2 (X) = E{X 2 ) - E{X) 2 = exp(-A) ^ k 2 -^ - A 2 (3.6) 

k=0 
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We can write : 



^k +°° ^A; +00 



(k 

k=0 k=l k=l v 



\/,-J +00 ^k-1 



k=l 



(k-l)\ t[{k-l)\ 



+00 +00 

- * 2 £^£i 

fc=0 fc=0 

= (A 2 + A)exp(A) 

Replacing in equation 3.6 leads to c 2 (X) = A. ■ 

A specificity of V(X) is the equality of expectation and variance. This 
property is useful when one wants to test if a given random variable follows 
this distribution. 



The Poisson distribution is also commonly used to approximate a bino- 
mial distribution B(n,p) when n is large and p close to 0. For example, if 
you analyze the number of kjackpot winners in a 6/49 lotto games, the prob- 
ability of winning the jackpot is around a chance over 15 millions. It is usual 
that 20 or 30 millions tickets are bought by players. The number of jackpot 
winners then follows a binomial distribution with n equal to the number of 
tickets and p is the probability of winning the jackpot. The reader can easily 
check that the expected number of winners is np and the variance np(l — p) 
according to the properties of the binomial distribution. But np{l — p) ~ np 
since p is almost 0. Consequently, expectation and variance are almost equal 
and the distribution of the number of winners can be approximated by a 
Poisson distribution with parameter A = np. 

The three distributions presented in this section are discrete, the two 
first taking a finite number of values and the last one, V(X), has a countable 
support, that is, the set of all integers. The following section is devoted 
to the most common continuous distributions appearing in financial models, 
especially to modelize prices and returns. 
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3.2 Continuous distributions 

3.2.1 Uniform distribution 

Definition 

Definition 86 X follows a uniform distribution on the interval [a; b] 
a < b, if its density fx is given by: 

f f x ) = J 6^ Ml 
\ 0 elsewhere 

We then denote X ~ U([a; b] ) . 

The CDF (Fx) of X is obtained by integrating the density. 



x—g 
6- 



^ if x e [a; b] 
F x (x) = { 0 if x < a 
1 if x > b 

Figure 3.2 shows the CDF of the uniform distribution on the interval 
[0;1]. 

From this definition we deduce that, on any interval [c; d] included in 

[a; 6]: 

Px ([c; d}) = P x (]c; d]) = ^ = - F x (c) 

The probability of a given subinterval is proportional to its length and 
all subintervals of [a; b] with a given length have the same probability (this 
explains the name "uniform distribution"). We will see hereafter that the 
moments of a uniform distribution are simple functions of a and b. 



Expectation and variance 



Proposition 87 If X - U([a;b}) then E(X) = ^ and a 2 (X) 



12 



Proof. If X follows a uniform distribution on [a; b] , the expectation of X is 
given by: 



/+oo i r-b 
xfx(x)dx = / xdx 

oo b-a J a 



b — a 



x 2 
Y 



b 



1 (b 2 - a 2 ) _ b + a 

2 b-a ~ 2 
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CDF of the uniform distribution on [0;1] 

F(x) 



1.0 
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-2-1 12 3 



Figure 3.2: CDF of the uniform distribution on [0; 1] 



In the same way, the variance is given by: 
°\X) = x 2 f x {x)dx- (h + a 



2 _ 1 


X 3 


6 


(^) 


b — a 


y 


a 





1 (b 3 - a 3 ) 
3 b — a 



(a 2 + 2ab + b 2 



l(a 2 + ab + b 2 ) - \(a 2 + 2ab + b 2 ) 
3 4 

{b~a) 2 

12 



3.2.2 Gaussian (normal) distribution 

Definition 

The normal distribution is the most common probability distribution in all 
sciences. It is due to a mathematical result called "central limit theorem" we 
will present in the next chapter. Without specifying the assumptions for the 
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moment, this theorem states that the sum of a large number of independent 
and identically distributed (i.i.d) random variables is approximately driven 
by a Gaussian distribution. As a consequence, a number of statistical tests 
are based on the Gaussian distribution. It allows to modelize stock returns 
with a reasonable accuracy. 

Nevertheless, one has to remember that it is not the optimal fit to the 
distribution of stock returns. Some empirical works show that alternative dis- 
tributions (like Levy distributions) are better choices to take into account 
large variations (especially crashes) regularly encountered on financial mar- 
kets. However, these distributions do not have nice statistical properties, so 
they are not often used in standard models. An important question nowa- 
days (especially after the 2008 financial crisis) is that the "Gaussian world" 
may be a dangerous assumption when defining risk measures such as Value 
at Risk. This measure doesn't take into account crashes, liquidity crises and 
other extreme movements. 
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Definition 88 X follows a Gaussian distribution with parameters m and 
a (also written X ~ J\f(m, a)) if its density fx is defined by: 

i I 1 { X — Tfl \ i 

MX, = ^ eXP {-2{— ) ) 

fx is the famous bell curve encountered in all statistical textbooks. It is 
symmetric with respect to the line x = m. Without entering the details, it is 
worth to recall that around 2/3 of the outcomes of a Gaussian distribution 
lie in the interval [m — a; m + a] and 95% of the outcomes lie in the interval 
[m — 2a; m + 2a] . 

Figure 3.3 shows the density of the Gaussian distribution A/"(0, 1) also 
called the standard Gaussian distribution. 



Density of the standard Gaussian distribution 




Figure 3.3: Density of A/"(0, 1) 

Moreover, most of the probability distributions used for statistical tests 
are functions of the Gaussian distribution. It is the case for the x 2 distribution, 
the Student or the Fisher- Snedecor distributions presented in the next sec- 
tion. 
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Expectation and variance 

Proposition 89 If X ~ A/"(m, a 2 ), E(X) = m and a 2 (X) = a 2 

Proof. The definition of the density gives, for the Gaussian distribution: 

2 N 



E(X) 



1 



xexp 



1 / .r m 

2 



a 



dx 



Denoting y = allows to write: 

E{X) = --L= J (ay + m) exp ^-^y 2 ^j dy 



a 



yexp 



V 2 dy 



m 



2ir 



exp 



y 2 dy 



a 



exp 



-,y 



+ m = m 



'2tt ' V 2' 

Finally, E(X) = m.The last equality comes from the fact that exp f— \y 2 ^j 

is the density of a standardized normal variable. 

The same change y = leads to calculate the variance as follows: 



a 2 (X) 



1 f + °° ( 1 

0 j (cry + m) 2 exp y --y 2 ) dy - m 1 



a 



oo 

2 r+oo 



2 / 1 o \ 2ma 
y exp — y dx H == 



( 1 ^ , , 
2/ exp ( --y ) dx 



The second term of the RHS is equal to 0 as the expectation of a zero- 
mean gaussian variable (times 2ma) ; the first term is integrated by parts. 



. / y x yexp — y dx 



(7 



//^xp J --y 



a 



yexp 



>y 



oo <* — oo 
-i +oo 



The first term between brackets is zero and the second term is equal to 
1. It proves that a 2 (X) = a 2 , m 
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As can be seen in the density, the Gaussian distribution is entirely de- 
termined by its two first moments. This property is especially interesting 
when stock returns are gaussian. It means that portfolio choice is uniquely 
guided by these two first moments. One doesn't need to assume quadratic 
utility functions to manage the portfolio problem in the mean- variance world 
of Markowitz (1952). 



3.2.3 Log-normal distribution 

Definition 

The continuous return of a financial security between dates 0 and t is given 
by r = In (j^J if St denotes the date-t price (t > 0). Consequently, getting 
a price when starting from a return, needs an exponential transformation by 
writing S t = Soe r . Using proposition 38 of chapter 1 leads to the characteri- 
zation of a Log-normal random variable. 



By 2020, wind could provide one-tenth of our planet's 
electricity needs. Already today, SKF's innovative know- 
how is crucial to running a large proportion of the 
world's wind turbines. 

Ud to 25 1 of the aeneratina costs relate to mainte- 
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industries can boost Derformance bevond exDectations. 
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Definition 90 X follows a Log-normal distribution with parameters m 
and a 2 z/ln(X) ~ A/"(m, <r 2 ). The density of X is given by: 




We note X ~ LN(m,a 2 ). 

Figure 3.4 shows the density of the Log-normal distribution with para- 
meters m = 0 and a = 1. 




Figure 3.4: Density of the Lognormal distribution 

Contrary to what was observed for the Gaussian distribution, the density 
is not symmetric. It explains some surprising results like the one illustrated 
in example 93 at the end of this chapter. 

A Log-normal distribution is the usual assumption for stock prices, espe- 
cially in the Black-Scholes option valuation model. It is worth to notice that 
it takes only positive values, a consistent characteristic for stock prices, due 
to the limited liability of shareholders. 



87 



Download free ebooks at bookboon.com 



Probability for Finance 



Usual probability distributions in financial models 



Expectation and variance 

Proposition 91 If X ~ LN(m,a 2 ), E(X) = exp (m+£\ and a 2 (X) 
exp (2m + a 2 ) (exp(a 2 ) - 1)) 

Proof. The expectation is calculated as follows: 

E(x) * r^(jjm^) 2 \ ix 



a\r2n Jo \ 2 \ a 

We use y = ln(x), and rewrite the expectation as: 



Rearranging the terms in the exponentials leads to: 

r,/^ 1 f +Q ° ( I f (y ~ (m + a 2 )) 2 \\ ( a 2 \ , 

E(X) = — ^/ exp -- L jjexpU + y )dy 



oo 

a 2 

exp I m H 



The integral is equal to 1 because it is the integral of the density of a normal 
random variable with mean (m + a 2 ) and variance a 2 . 

The same method is used to calculate V(X). We first show that E(X 2 ) = 
exp (2(m + a 2 )) and we finally get V(X) = exp (2m + cr 2 ) (exp(<r 2 ) — 1)) ■ 

Example 92 Let Y ~ A/"(0, 1) and X the random variable defined by: 

X = exp ( i^rn — —J + aY 

where m and a are real numbers, a > 0. X represents the date-1 price of 
a stock whose return is Gaussian with parameters m and a when the date-0 
price is equal to 1. 

A call option contract on X with exercise price K and maturity 1 is a 
financial security paying max(X — K; 0) at the maturity date. 

What is the expected value of this final payoff? 
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We need to calculate E [(X — K)_A where (x) + = x ifx > 0 and (x), = 0 
elsewhere. Denoting fx the density of X, we get: 

+00 +00 
E [{X - K) + ] = J f x (x) mzx(x - K; 0)dx = J fx{x)(x - K)d$3.7) 

0 K 

+00 +00 

xfx(x)dx — K J f x {x)dx (3.8) 

K 

J xf x {x)dx - KP(X > K) 



K 

+00 



xf x {x)dx - KP(ln(X) > ln(K)) 



(3.9) 
(3.10) 



A" 



We know by definition of X that 

P(ln(X) > ln(iT)) = P 

la(K) — ( m 



P 



Y > 



m - y ) + o-Y > ln(K) 



a 
2 



a 



(3.11) 
(3.12) 



If we denote N(x) the CDF of a standard Gaussian variable, we obtain: 

( ln(K) - (m - % 
1-N \ ^ 



P(X > K) 



N 



\n(K) + (m - % 



a 



(3.13) 



(3.14) 



The last equality is coming from the symmetry of the density of a Gaussian 
variable. 

Using the technique in proposition 91, the first term in equation 3.10 may 
be written as: 



xf x {x)dx = e m N 



- ln(K) + (m + % 



(3.15) 



K 
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These formulas are the basis of the famous option valuation model de- 
veloped in the seventies by Fisher Black and Myron Scholes (Black- Scholes, 
1973). 

Example 93 Assume that the logarithmic return of the S&P500 is driven 
by a normal distribution with parameters m = 3% and a = 20%. The current 
value of the index is 1000 points. What is the price a risk-neutral investor 
is ready to pay to buy a contract delivering $100 if the S&P500 value in one 
year is in the interval [900; 1000] . What price is he ready to pay if the interval 
is [1000; 1100]? Why may it be a surprising result? 
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3.3 Some other useful distributions 

When using large samples of i.i.d variables, statistics like the mean follow 
a Gaussian distribution according to the central limit theorem (see chapter 
4). The variance is then written as a function of Gaussian random variables. 
Moreover, when standardizing variables (transforming to get a zero-mean and 
a unit variance) leads to use more or less complicated functions of Gaussian 
variables. It is the reason why a number of useful distributions in statistics 
are "derived" from the Gaussian distribution. We shortly present hereafter 
the x 2 distribution, the Student-t and the Fisher- Snedecor distributions. 

3.3.1 The x 2 distribution 

Definition 94 A random variable Y follows a x 2 distribution with n de- 
grees of freedom ifY can be written as: 

n 

Y = Y J X 2 (3.16) 

i=l 

where the X, are independent standard Gaussian distributions, that is, Vi, 
~ JV(0, 1). 

This distribution is useful when one wants to perform a statistical test for 
the variance a 2 of a random variable. If (X%, ....X n ) are identically distributed 
Gaussian random variables with parameters (m, ctq), the variable Y defined 
by: 

follows a x 2 distribution with n degrees of freedom. Equality 3.17 leads to: 

1 n 

-]T(X-m) 2 (3.18) 



alY 



n n 



The expression on the right-hand side of equation 3.18 is the empirical 

n 

variance. If m is unknown and estimated by X = ^^^Xj, the variable 

i=l 

Y*, obtained by replacing m by X in equation 3.17 follows a x 2 distribution 

n 

with 7i — l degrees of freedom. In this case ^-j- (Xj — X) is used as an 
unbiased estimator of the variance to perform the test. 
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Density of the Gaussian and Student(6) distributions 

f 00 




Figure 3.5: Comparison of the densities of a Student and a Gaussian density 

The x 2 distribution also appears in the well-known \ 2 test aimed at test- 
ing the independence of two distributions, and in the test comparing a em- 
pirical distribution to a theoretical distribution. 

3.3.2 The Student-^ distribution 

Definition 95 A random variable Y follows a Student— t distribution 

with n degrees of freedom ifY can be written: 




(3.19) 



where Z is a standard Gaussian distribution and X follows a x 2 distribution 
with n degrees of freedom. 

The Student-t distribution is used to test the equality of means in two 
populations, or to test regression coefficients. For example, they are common 
when testing the market model or the Capital Asset Pricing Model. 

On figure 3.5, we can see than the Student density (bold line) has fatter 
tails than the Gaussian density (thin line) when the number of degrees of 
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freedom is low (6 in the example). It is then sometimes used to represent 
stock returns when one wants to take into account that extreme returns 
are more frequent on real markets than what is predicted by a Gaussian 
distribution. 

More generally, a Student(n) has a variance equal to nj (n — 2) and a kur- 
tosis equal to 3(n — 2)/(n— 4). It is only defined when n > 4. We observe that 
for n = 6, the kurtosis is equal to 6, that is greater than the corresponding 
moment for the Gaussian distribution. 

3.3.3 The Fisher- Snedecor distribution 

In a multiple regression, beyond the significance of the individual regression 
coefficients, most softwares provide the so-called F of the regression. It comes 
from the Fisher-Snedecor distribution, defined as follows. 

Definition 96 A random variable Y follows a Fisher-Snedecor distribu- 
tion if it writes: 

Ex 

Y = ^ (3.20) 

where X% (X 2 ) follows a x 2 distribution with n\ {n-i) degrees of freedom. 

It can be seen that a F{n\,n<i) is the inverse of a F(n2,n\) variable. 
It is the reason why statistical tables of F variables only provide values 
greater than 1. When you get an observed value below 1 when testing the 
equality of two variances, take the inverse, reverse n\ and n 2 and look at 
the corresponding position in the statistical table. Obviously, when testing 
the relevance of a regression model, the two variances are not equivalent. 
You just want to know if the variance explained by the model is significantly 
greater than the unexplained variance. In this F statistic lower than 

1 simply means that your model is not the right one. 
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Chapter 4 

Conditional expectations and 
Limit theorems 

As already mentioned in chapter 1, learning a piece of information changes 
probabilities of events. It then changes expectations of random variables 
because these expectations are probability-weighted averages. Conditional 
expectations are a natural tool to address this issue. Moreover, conditional 
expectations play an important role in valuation models. One essential result 
in finance theory is the following: when there are no arbitrage opportunities, 
the date-t value of an asset is the discounted expected value 1 of date-t + 1 , 
conditional on the information known by date t. In the book to follow (Sto- 
chastic Processes for Finance, Roger, 2010), conditional expectations will 
play an even more important role. In muti-period models, the stochastic 
processes called martingales are fundamental. But their definition relies es- 
sentially on conditional expectations. It is the reason why we want to insist 
now on the importance of understanding this (maybe difficult) topic. 

4.1 Conditional expectations 
4.1.1 Introductive example 

We start with a very simple framework allowing to carefully describe what 
is going on when an information is revealed. Consider a probability space 
(Q,A, P) where Q = {ui, uj 2 , w 3 , 0J4} , A = V(VL) and P{oJi) = 0.25 for all 
% = 1, .., 4. Two random variables X and Y are defined on Q and their values 
appear in table 4.1. 

1 The expectation is calculated with respect to a specific probability measure called the 
risk-neutral measure. In this framework, the risk-free rate is used for discounting. 
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State 


X 


Y 


UJi 


1 


1 




2 


1 




3 


2 




4 


2 



Table 4.1: Definition of X and Y 

We get immediately: 

E(X) = ^(1 + 2 + 3 + 4) = 2.5 (4.1) 
E(Y) = -(1 + 1 + 2 + 2) = 1.5 (4.2) 

Suppose now that the value of Y is observed before the value of X is revealed. 

If Y(oj) = 1, the true state to may be either u\ or 10%- In other words, 
the event {u\, 0J2} occurs and it is equal to the event {Y = 1}. Probabilities 
of all states change and become conditional on the event {Y = 1} . The new 
(conditional) probabilities for the 4 states are now: 

{P{ Ui \{Y = !}),• = !, 4)=(h h 0; 0) (4-3) 



The other consequence is the change in the expectation of X which be- 
comes the conditional expectation denoted E (X \ {Y = 1}) with 

E(X \{Y = 1}) = ^2X( Ui )P( Ui \{Y = 1}) = I (1 + 2) = 1.5 (4.4) 

In other words, if E(X) is the initial price of a stock, {Y = 1} corresponds 
to bad news leading to a price decrease. 

The important fact here is that the change of probabilities and expecta- 
tions is not linked directly to the values of Y but to the information revealed 
by the observation of Y. If the values of Y, 1 and 2 in the example, had 
been replaced by 100 and 200, the result would have been the same. The 
conditional expectation of X would also be 1.5. Each event with respect 
to which we define conditional probabilities generates an other conditional 
expectation. 
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4.1.2 Conditional distributions 

Discrete variables 

Let us first consider two discrete variables X and Y whose supports are 
respectively (xj, i = 1, ...,n) and (yj,j = 1, •••,£>) • 

Definition 97 a) The conditional probability distribution of X knowing 
{Y = i/i} is the mapping denoted as Px\y(- \y%) and defined by: 



In this definition it is assumed that P({Y = Hi}) ^ 0, but it is in fact 
implicit in the definition of the support of Y. Px\y{- \Ui) effectively induces 
a probability measure on the support of X. 
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Continuous variables 

Denote fxy the joint density 2 of a couple of continuous random variables, 
fx and fy being the densities of X and Y. 

Definition 98 For any y satisfying fy(y) > 0, the conditional density of 
X w.r.t. {Y = y} is the function fx\y(- \y) defined by : 

f ( I \ fxy{x,y) 

Remark 99 When X is continuous with density fx and B is an event such 
that P(B) ^ 0 the density of X conditional on B is defined by: 



We can now characterize conditional expectations, starting with the most 
simple case of conditioning with respect to an event. 

4.1.3 Conditional expectation with respect to an event 

The introductive example shows how to define the conditional expectation 
of a random variable with respect to an event in A. 

Definition 100 a) The conditional expectation of a discrete variable X , 
taking values xi, x^, w.r.t an event B in A, is the quantity E(X\B) de- 
fined by : 



b) The conditional expectation of a continuous variable X with density 
fx w.r.t an event B in A, is the quantity E(X \B) defined by : 




(4.5) 



N 



E(X\B) = Y J ^P({X = x t }\B) 



i=l 



E(X\B) 



P(B) 



1 



xfx{x)dx 




xfx{x \B)dx 



X(B) 



2 See chapter 2, definition 77. 
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In the introductive example at the beginning of this section, if {Y = 2} 
we obtain: 

N 

E(X\{Y = 2}) = J>P(H}|{y = 2}) (4.6) 

i=l 

= 3x P(u 3 \{Y = 2})+Ax P(u,\{Y = 2}) (4.7) 
= \ (3 + 4) = 3.5 (4.8) 

The first equality comes from P(oj x \{Y = 2}) = P(u 2 \{Y = 2}) = 0. 

We could have saved some place and notation if we had addressed the 
problem in a more general way. In fact, we could define directly the con- 
ditional expectation of X with respect to the random variable Y. Before 
knowing the value of Y, we already know how to calculate the conditional 
expectation if one of the two events occurs. This remark allows to propose a 
more general approach. 



4.1.4 Conditional expectation with respect to a ran- 
dom variable 

Discrete variables 

Definition 101 The conditional expectation of a discrete variable X, taking 
values Xi, ...,xn, w.r.t. a discrete random variable Y, taking different values 
yi, ■■■■,yMi denoted as E(X \Y), is the random variable defined by: 

N 

Vwe{y = y i }, E{X\Y){oo) = Y^^P({X = x l }\{Y = y 3 }) (4.9) 

i=l 

It leads to characterize the conditional expectation of X w.r.t. Y as in 
table 4.2. 



State 


E(X\Y) 




1.5 


cu 2 


1.5 




3.5 


U4 


3.5 



Table 4.2: Conditional expectation of X with respect to Y 
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Continuous variables 

Suppose now that X and Y are continuous with densities fx and fy, the 
conditional density being denoted fx\y( x \y) as before. The conditional ex- 
pectation of X w.r.t. {Y = y} is written: 

/+oo 
xfx\ Y (x\y)dx 
-oo 

More generally, the conditional expectation of X w.r.t. Y is the random 
variable defined by: 

/+oo 
xfx\ Y (x\y)dx 
-oo 

Remark 102 We observe that, when Y is discrete, the subsets {Y = yj} 
define a partition on Q. Second, the value of the random variable E(X \Y) 
is constant on each subset of the partition, and it is also true for Y, by 
construction. In other words, the information revealed by Y is the same as 
the information revealed by E(X \Y). A key remark here is that E(X\Y) is 
By -measurable. It leads to the general approach of conditional expectations. 
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4.1.5 Conditional expectation with respect to a sub- 
tribe 

The examples provided before for discrete variables showed that the key point 
when conditioning with respect to a random variable is not the values taken 
by this variable but the information these values reveal about the true events. 
Consequently, the more general way to define conditional expectations is to 
condition with respect to subsets of events or, more precisely, with respect 
to sub-tribes. 

Definition 103 The conditional expectation of an integrable random vari- 
able {that is X £ L 1 (fi, A, P)), w.r.t a sub-tribe B of A, is any B-measurable 
random variable Z, satisfying: 



This definition deserves several comments. 

• As Z is defined by means of integrals, two variables Z and Z' can 
satisfy equality (4.10) if they differ only on negligible events. They are 
called versions of the conditional expectation. Any version used 
in calculations is in general denoted E{X\B). 

• The definition also means that a variable X and its conditional expec- 
tation E(X \B) have the same mean on any event of the tribe B. We 
let the reader check it was actually the case in the former example (see 
table 4.1). 

• The equality 4.10 implies that if X is B- measurable, E(X \B) = X. 
Example 104 Let Cardial) = 4, P(ui) = pi for each uji, and B defined by: 

B = {0, {u u uj 2 } , {uj 3 , w 4 } , tt} 

Denote D>i = {ui, uj 2 }, E> 2 = {^3,1^4} and let X be defined by 3 X = (x\; x 2 \ £3; X4) . 
The equality 4. 10 implies: 



3 As Card{Q) = 4, X is defined by the vector of values it takes on the four states of 
nature. 



WB eB,E(Zl B ) =E(X1 B ) 



(4.10) 



PlX 1 +p 2 X 2 = PlZ!+p 2 Z 2 
P3X3 + PiXi = p 3 z 3 + p A z 4 



(4.11) 
(4.12) 
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where the conditional expectation Z takes values (z\\ z%\ z& z±) . Equation 
4-11 refers to evant Bi and equation 4.12 to event B 2 . Moreover, Z is Im- 
measurable; it means that it is constant on Bi and on B 2 . It implies that: 

zi = z 2 

Z 3 = Z4 

We finally get: 

zi = z 2 = — - — [piXi + p 2 x 2 ] = E (X \Bi ) 

Pi +P2 

z-i = z A = ■ [p 3 x 3 + p 4 x 4 ] = E {X \B 2 ) 

P3 + Pi 

The result is intuitive. The conditional expectation on Bi (B 2 ) is the 
(conditional probability) weighted average of the values taken by X on this 
subset Bi(B 2 ). 

We also check here that if X was already B-measurable, then E (X \B) 
would be equal to X. 



4.2 Geometric interpretation in L 2 (fi, A, P) 

Conditional expectations have a natural geometric interpretation when the 
analysis is restricted to square integrable random variables, that is to ele- 
ments of the vector space L 2 (O, A, P) . We then assume it is the case in this 
section. 



4.2.1 Introductive example 

To explain what is the "geometry" of conditional expectations, first consider 
a simple example in the two-dimensional space M 2 , endowed with the usual 
metric: 

d(x,y) = yj(xi - yi) 2 + [x 2 - y 2 f 
where x' = (x u x 2 ) et y' = (yi,y 2 ) . 

For a given x G M 2 , suppose that we want to determine the point z = 
(zi, Zi) on the bisector of the positive orthant which is the closest to x. We 
have to solve: 

min zi (xi - z^ 2 + (x 2 - z^ 2 
because points on the bisector have the same coordinates Z\ = z 2 . 
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We find immediately Z\ = Xl + X2 , In other words, z is the orthogonal pro- 
jection 4 of x G K 2 on the one-dimensional subspace defined by the bisector. 

The reason is that z — x is orthogonal to z. 

< z — x,z >= (z\ — Xi)z 1 + [z\ — x%)Z\ (4-13) 
x 2 -xi x x -x 2 . . 

= — g Zl + — 2 — Zl = ^ ' 

Suppose now that R 2 is endowed with the metric: 



d*(x,y) = \jp{x 1 -yi) 2 + q{x 2 - y 2 ) 



2 



with p + q = l,p > 0,q > 0. It simply means that the two coordinates are 
not equally weighted. 

Solving the same optimization problem leads to: 

Zi = pxi + qx 2 

zi is then a weighted average (we are tempted to write "an expectation") 
of the components of x. 



4.2.2 Conditional expectation as a projection in L 2 

The preceding approach can be applied almost without modifications to the 
vector space of square-integrable random variables. If X is an element of 
L 2 (f2, A, P) , the conditional expectation E(X \B) is ^-measurable and so 
belongs to the subspace 5 of £>-measurable variables denoted L 2 (f2, B, P) . 

In example 104, L 2 (Q,A,P) could be identified to M 4 and L 2 (Q,B,P) 
to M 2 since the variables in this subspace had only two different components. 
We are going to show that E(X \B) is the orthogonal projection of X on 
L 2 (Q, B, P) . In other words, E (X \B) solves the optimization problem: 

min ZeL 2 {njBjP) E [(X - Z) 2 ] = min ZeL 2 {n ^ P) d(X, Z) 2 = E \{X - E (X \B)) 2 ] 

To keep things simple, we just show this property with the data of example 
104. As E (X \B) is ^-measurable, we know that 

zi = z 2 (4.15) 
z 3 = z A (4.16) 

4 Remember that two vectors are orthogonal when their inner product is zero. 

5 To be completely rigorous, we should adopt a different notation for P (for example 
Pg because it is defined on the sub-tribe B in L 2 (Q,B, P). For the sake of simplicity, we 
keep P to denote the probability measure on B. 
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Therefore, 



E[(X-Z) 2 } = Pl ( Xl - Zl ) 2 +P2 ( X2 -z 1 ) 2 +p 3 (x 3 -z ;i ) 2 +p 4 (x 4 -z ;i ) 2 (4.17) 

The partial derivatives with respect to z\ and z 3 must be zero to obtain 
an optimum. 



8E [{X - Zf] 
dE [(X - Z) 2 ] 



-2 [ Pl ( Xl - Zl )+p 2 (x 2 - Zl )\ = 0 (4.18) 
-2 [p 3 {x 3 - z 3 ) + p 4 (x 4 - z 3 )} = 0 (4.19) 



The minimum is obtained with: 
1 



Z\ 



Z2 



(pixi + P2X2) = E(X\B) M = E (X \B) (^-20) 



Pi +P2 

z 4 = (P3X3 + Pax 4 ) = E{X\B) (w 3 ) = E{X\B) (w^.21) 

P3 + Pi 



The reader can check that the second partial derivatives are positive, 
ensuring that the stationary point is a minimum (also because the cross- 
derivatives are 0). We are done. 

The properties of conditional expectations can now be presented in a 
more intuitive way, using this geometrical interpretation. 
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4.3 Properties of conditional expectations 

Proposition 105 Let (X, Y) be two random variables in L 2 (Q, A, P) and 
B,B' two sub-tribes of A satisfying B C B': 

1) If X is a constant c G R, E (X \B) = c 

2) V(o, b) G M 2 , E (aX + bY\B) = aE (X \B) + bE (Y \B) 

3) IfX<Y, E (X \B) < E (Y \B) 

4) E(E (X\B') \B) = E(X\B) 

5) IfX is B -measurable E (XY \B) = X E (Y \B) 

6) IfX is independent of B, E (X \B) = E[X) 

We do not provide all the details of the proof but it is worth to underline 
the intuitions leading to some of these results. 

First, a constant c can also be written cIq. It is then a random variable 
measurable with respect to any tribe, especially w.r.t. B. The projection 
theorem then implies that c is its own projection on L 2 (Q, B, P) . Remember 
that L 2 (Q, B, P) being a vector sub-space of L 2 (Q, A, P) , it is a convex set. 

Points (2) and (3) are direct consequences of the definition of conditional 
expectations. 

Point (4), which doesn't seem obvious at first glance, may be easily un- 
derstood using the geometric interpretation of the conditional expectation. 
E {X \B') is the orthogonal projection of X on L 2 (O, B', P) . E (E {X \B') \B) 
is the projection on L 2 (O, B, P) of E {X \B'). 

Point (4) simply says that projecting first on L 2 (Q,B',P) and then on 
L 2 (Q, B, P) is equivalent to make directly the projection on the smallest 
space L 2 (Q, B, P) . It is a well-known property of projections on finite- dimensional 
spaces. Moreover, it is worth to notice that if B = {0, Q}, E (X \B) = E(X) 
and then E (E (X \B')) = E (X) whatever B' is. 

Point (6) could be written E (X — E(X) \B) = 0 since E(X) is a constant 

(see point (1)). In other words, X — E(X) independent of any variable Y in 
L 2 £>, P) means. 

E((X - E(X)) Y)=E(X- E(X)) E(Y) = 0 (4.22) 
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The first term on the left-hand side of equality 4.22 is the inner product 
of X — E(X) and Y. Two vectors with an inner product equal to zero are said 
orthogonal. So, there is a close link between independence and orthogonality 
in the space of square-integrable variables. 



4.3.1 The Gaussian vector case 

To end this section on conditional expectations, we present the specific case 
of Gaussian vectors 6 . Such conditioning with Gaussian vectors is common 
either in the microstructure literature or in the "information" literature in 
which investors are supposed to receive private signals 7 . Equilibrium prices 
are more easily obtained when the private signals follow a joint Gaussian 
distribution. 

Definition 106 A random vector X = (Xi, ....,X n ) is said Gaussian if every 

n 

linear combination ajJQ is a Gaussian variable. 
i=i 

Denot m' = (E(Xi), E(X n )) the vector of expectations and Vj the 
covariance matrix of X. The density fx of X is given by: 



Vx G M n , f(x) = ( — L^) 1 exp (--(x- m)'V- X ' 



x — m 



y/Det(V x ) V 2 V ~ ""' ' X 

(4.23) 

where Det{W ' x ) is the determinant of the covariance matrix (assumed differ- 
ent from zero). 

Proposition 107 Let X = (X\, ....,X n ) be a Gaussian random vector with 
parameters m andV x\ forp < n letY\ = (X\, ....,X P ) andY 2 = (X p+ i, ....,X n 
Decompose Vj in the following way: 



V 



x 



^21 £22 



where Sjj is the covariance matrix of Yi and is the matrix containing 
covariances between the components of Yi and Yj for i,j = 1,2, i 7^ j. The 
probability distribution ofY\ conditioned on Y 2 = 7/2 £ M n_p is Gaussian with 
the following two first moments: 

E (Yi |Y 2 = y 2 ) = E{Y 1 ) + E^E^ 1 (y 2 - E(Y 2 )) (4.24) 

^Y 1 \Y 2 =y 2 = ^11 — S 12 S 22 1 E 2 i 



Cl Random vectors have been presented in chapter 2, section 2.5. 
7 One of the seminal papers in the field is Grossman (1976). 
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Case p — 1 and n = 2 

Applying the above proposition when p = 1 and n = 2 gives: 

(Tl 2 

E (X 1 \X 2 = x 2 ) = mH J" (2/2 - w 2 ) 

a 2 



VX 1 \X 2 =X 2 — °"l — 



2 °J2 



If p 12 stands for the correlation between the two variables, we obtain: 

V Xl \x 2 =x 2 = o-i(l - p\ 2 ) 

This result may be obtained by using the definition of the conditional 
density (with x' = (xt,X2)). 



fx(xi 1 X 2 ) _ (2n)y/\Det<y x )\ 



exp {—\{x — m)'V x 1 (x — to)) 



(72 exp (— \{x — m)'V x 1 (x — to)) 

°"2 / 1 / / n/tt-1/ \ (x 2 -m 2 



^\/<y\o\ - o\ 2 
Calculating the distinct parts leads to: 



{ 1 \ , \Kr-lf \ ( %2-m 2 \ 

exp — (x — m) V x [x — m) — 

V 2 \ V a 2 ) 



1 / (To — (Tl2 



X (T?(t| - (T? 2 V -^12 CT 1 

Denoting A = (x — m)'V^- 1 (x — to), we get: 

(T^i — 20"2a;iTOi — 2Xi(Ti2X2 + 2Xi(Ti2TO2 

_ ^.2^-2 1/2 

CT 1°2 - °12 

cr|mf + 2m 1 (Ti23;2 — 2m 1 cr 1 2m 2 + (Tja;| — 2a\x 2 m 2 + cr\rn^ 

a\a\ - a\ 2 

Consequently, the conditional density can be written as: 

fx(xi, x 2 ) _ a 2 ( 1 {-alxi + a\m x + a 12 x 2 - o 12 m 2 f 



fx 2 (x 2 ) ypB^/{a\a\ - a\ 2 ) y \ 2 a\ (trf a\ - a\ % ) 
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If we look now to what is given in the proposition: 



E(X 1 \X 2 =x 2 ) 

the corresponding density g is: 

. . 1 



mi + ^f [x 2 - m 2 ) 



a 2 



exp -- 



I ( X\ — Wi — (x 2 — m 2 



7 27r 




"2 



2 0$-<T 



1 (— o\x\ + a\m\ + a 12 x 2 - ai 2 m 2 ) 



a\ {a\a\ - a\ 2 ) 



We then come to the desired result g{x\) = fxi\x 3 ( x i \ x %)- 



The financial interpretation of Vx 1 ix 2 = 



-■'■■2 



p\ 2 ) when X 2 



x 2 

is a signal received by an investor is quite natural. The variance of X\ is 
lower after receiving the signal but the decrease depends on the correlation 
of the signal with the variable X% under consideration. Obviously, the sign 
of p 12 does not matter because a negatively correlated signal brings as much 
information positively correlated signal. 
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4.4 The law of large numbers and the central 
limit theorem 

In numerous real situations, we have to add a large number of random vari- 
ables and the question is to know the behavior of the sum or of the average 
of these variables. For example, when studying the return of an equally 
weighted portfolio, we calculate an average return across a set of stocks. 

The return on a long-term (say 20 years) investment in a given security is 
the sum of around 5000 daily returns. When successive returns are assumed 
independent, a large number of i.i.d random variables are added to get the 
long-term return. Can we say something about the probability distribution 
of this variable? This question is important in finance, for example when 
studying the equity premium puzzle initially presented by Mehra and Prescott 
(1985). Historical data show that stocks outperform bonds on the long run 
by around 5 to 6% in many countries. To analyze this premium, a first point 
is to assume a reasonable distribution for returns. 

In other models like Ross' APT (Arbitrage Pricing Theory, 1976), the j3 
coefficients on the different risk factors are obtained by building a (almost 
risk-free) arbitrage portfolio with a large number of assets. The specific risk 
is neglected because of the diversification provided by the large number of 
assets in the portfolio. 

What is the mathematical result allowing to neglect the specific risk in 
a portfolio containing a large number of stocks? On the mathematical point 
of view, the tools used in these models are the law of large numbers and 
the central limit theorem. They are based on convergence of sequences of 
random variables. We already saw convergence in L 1 and L 2 . We start this 
section by presenting three other types of stochastic convergence and then 
address the two essential theorems. It is worth to mention that a version 
of the central limit theorem shows the convergence of an option price in the 
Cox- Ross- Rubinstein (1979) model to the one obtained in the Black-Scholes 
model (1973). 

4.4.1 Stochastic Convergences 

Definition 108 Let (X n ,n G N) be a sequence of random variables and X 
a random variable defined on a probability space (O, A, P) ; 

1 ) (X n , n G N) converges to X in probability (denoted as X n X) if 
for any e > 0 : 

lim P(\X n -X\ > e) = 0 

ra— >+oo 
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2) (X n , n gN) converges to X almost surely (denoted as X n ^1 X) if 
there exists a set Q 0 C Q with P (fin) — 1 suc h that: 



3) Let Px n (Px) be the probability distribution of X n (X) ; (X n ,n € N) 
converges to X in distribution (denoted as X n — > X) if, for any bounded 
continuous function f : 



This convergence is also called weak convergence. 

These three notions of convergence appear in the limit theorems of the 
next section. 

4.4.2 Law of large numbers 

"It? is difficult to understand why statisticians commonly limit their inquiries 
to averages, and do not revel in more comprehensive views. Their souls seem 
as dull to the charm of variety as that of the native of one of our flat English 
counties, whose retrospect of Switzerland was that, if its mountains could be 
thrown into its lakes, two nuisances would be got rid of at once ". F. Galton 

We study here the behavior of the average of a large number of random 
variables, by characterizing the expectation and the variance of the mean. 
Preliminary results will be useful to get laws of large numbers. 

Proposition 109 Markov inequality 

Let X be a random variable taking positive values, being integrable with 
E(X) = fi. For any A > 0 the following inequality is satisfied: 



Obviously this result is interesting only if A > 1 . Remark that no assump- 
tion is made on the type of probability distribution followed by X. It gives 
a bound for the probability that a given random variable goes above a given 
multiple of its own expectation. Markov inequality is valid in a very general 
framework. In particular, it is not assumed that X has a finite variance. But 
if it is the case, a more specific result is obtained as follows. 

8 I borrowed this citation in Koch-Medina and Merino (2003), p221. 



Vw G flo, lim X n (u) = X{uj) 




P(X>aA)<- A 
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Proposition 110 Byenaime-Tchebychev inequality 

Let X e L 2 (tt,A,P) such that E(X) = m and V(X) = a 2 ; for any 
B > 0 we have: 

2 

p(\X-a\>B)<^ 

A financial illustration of this result is given by Jorion (2000) in his book 
Value at Risk (see chapter 1, example 31). The Basel committee requires 
a 99% level and a 10-days horizon to calculate the VaR. The amount of 
required capital obtained by this calculation is then multiplied by a security 
coefficient equal to 3. The preceding inequality can also be written as: 

P(\X-u\>Aa)<± 

where A is a positive constant. If X has a symmetrical distribution, we get: 

P(X-a<-Aa)<^- 



For the RHS to be equal to 0.01, we need A = J 2x l 01 = 7.0711. However, 
banks often assume Gaussian returns. With Gaussian returns A = 2.32, that 
is 3 times less than the number obtained without this assumption. 
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Proposition 111 Weak law of large numbers 

Let (X n , n gN) be a sequence of square integrable and identically distrib- 
uted random variables (with expectation [i and variance a), pairwise inde- 
pendent. Let Z n = - YH=i Xi ; (Z n , n e N) converges in probability to the 
(constant) random variable a. Moreover, for any e > 0: 

2 

P(\Z n -a\>e) < ^ 
ne z 

Remark : The second part of the proposition is obtained by applying 
proposition 110. 

Convergence in probability is not very intuitive. A sufficient condition 
can be obtained when square integrable variables are considered. 

Proposition 112 Let (X n , jigN) be a sequence of square integrable random 
variables; X n converges to X in probability (X is also assumed in L 2 ) if the 
two following conditions are satisfied: 

a) \im n ^ +00 E(X n ) = E(X) 

b) lim n ^ +00 V(X n - X) = 0 

Proposition 113 Strong law of large numbers 

Let (X n ,n G N) be a sequence of square integrable i.i.d random variables 
and Z n = \ Y!i=i x i- 

(Z n , fiGN) converges almost surely to fi. 

On the contrary, if E(\X n \) = +oo, the sequence Z n is almost surely 
unbounded. 

Laws of large numbers insure the dividends of insurance companies share- 
holders. A large number of identical but independent policies reduces the 
variance of future liabilities. Taking into account risk aversion of agents, com- 
panies are able to require more than the expected damage (pure premium) to 
clients. The diversification of the portfolio of liabilities of a company allows 
to reduce their dispersion, generating, in most profit. 

In the APT model, a multifactor structure of returns is assumed as fol- 
lows. 

K 

r l = E{r i ) + Y J P lk F k + e l (4.25) 

k=l 
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where fj is the random return of asset i, Fi,...,Fk are random variables 
representing common factors. j3 ik is the sensitivity of stock i to the variations 
of factor k and, finally, is a random variable representing the specific risk 
of firm i. The common factors are assumed uncorrelated (Cov(Fk, Fj) = 0 
for j ^ k) and uncorrelated with specific risks (Cov(Fk,£i) = 0). Finally, 
specific risks are uncorrelated (Cov(ei,e m ) = 0 for i ^ m). 

The return on a given asset is then divided in two parts. The first one 
is linked to the common risk factors and the second to a specific factor. 
More precisely, consider a large number of stocks N; the return of an equally 
weighted portfolio is written as: 

N N N K N 

i=l i=l i=l k=l i=l 

N K I N \ N 

= jfT. E ^)+T. hv^ ft ' Ft+ Jv^ £i (4 - 27) 

i=l k=l \ i=l J i=l 

Large portfolios allow to diversify away the specific risk, because of the 

N 

law of large numbers. In other words the variance of e, tends to 0 when 

i=l 

the number of stocks in the portfolio tends to infinity. 



4.4.3 Central limit theorem 

The central limit theorem explains why the Gaussian distribution is so impor- 
tant in all scientific fields. We provide hereafter two versions of the theorem. 
The first one assumes that the variables entering the mean are distributed 
according to the Bernoulli distribution and gives the intuition of why the 
Cox-Ross-Rubinstein model converges to the Black-Scholes model. 

Proposition 114 Central limit theorem (CLT) 

Let (X n ,n G N) be a sequence of i.i.d Bernoulli random variables with 
parameter p; the sequence T n defined by: 

j, = E"=l X i ~ n v 
y/np(l-p) 

converges weakly to the standard Gaussian distribution. 

This version of the CLT is not sufficient to obtain the convergence of the 
binomial model to the Black-Scholes model because the parameter p doesn't 
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depend on n. In the Cox-Ross-Rubinstein model, parameters u and d depend 
on the delay between two trading dates and this delay tends to 0 when the 
number of sub-periods increases. The probability of an up- state also depends 
on the number of sub-periods. We then need a "dynamic" version of the 
theorem. 

Definition 115 Let 9 Y = (y™, ...,Yj^ n yn > lj a triangular array of zero 
mean random variables. For any n, let s 2 n = V {y!£$ Y i n ) ■ Y satisfies the 



Lindeberg condition if, for any e > 0, the sequence U = \ U™, n > 1 
defined by: 



U? = Y? si \Y?\ < ss ri 
= 0 sinon 



satisfies : 



lim 



oo S 2 
n 



The following proposition provides the right version of the CLT to study 
the convergence of the discrete-time option pricing model to the continuous- 
time model. 

Proposition 116 Let Y = (y™, Y^ n y n > 1 J a triangular array of ran- 
dom variables such that the zero mean sequence (y™ — E (Y™) , Yj^ n ) — E [Y^ n yj , n > 1 

satisfies the Lindeberg condition. For any integer n > 1, let Z n = Yli=i ■ 
If E (Z n ) — > /i and V (Z n ) — > a 2 ^ 0, the sequence Z n weakly converges to a 
standard Gaussian variable Z . 

This proposition is useful in calibrating the discrete-time model of Cox- 
Ross- Rubinstein where you need to define the parameters u and d character- 
izing the stock price process, u and d are chosen to keep constant the yearly 
expected return and variance, independently of the duration of sub-periods 
(see Hull, 2009, p 248-249). 



This definition and the following proposition can be found in DufRe, 1988, p244-246. 
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